R: Sample selection based on the Kennard-Stone algorithm

ken.sto {soil.spec}

R Documentation

Sample selection based on the Kennard-Stone algorithm

Description

The function chooses points based on Euclidean distance measure most representative samples. One can (i) select a number or a percentage of a sample set or (ii) divide a sample set into calibration and representative validation set.

Usage

ken.sto(inp, per = "T", per.n = 0.3, num, va = "F", sav = "T", path = "", out = "Sel")

Arguments

`inp`	a numerical matrix or data.frame containing the input spectra
`per`	a logical value indicating whether the selected samples should be a percentage (given in `per.n`) or a set number (given in `num`) of `inp`. The default `"T"` takes a percentage.
`per.n`	a numerical value between 0 and 1.
`num`	a numerical value between 1 and the sample number minus 1.
`va`	a logical value indicating whether to select samples out of `inp` or to divide them into a calibration and validation set.
`sav`	a logical value indicating whether the function output shall be saved.
`path`	a character giving the path name where the function output shall be saved.
`out`	a character giving the function output name, in case `sav` is `"T"`.

Details

Sample selection is done following and adapted procedure from Kennard & Stone (1969). It is a stepwise procedure by maximizing the Euclidean distance based on the important number of principal components to the objects already chosen. The number of important principal components is selected so that the increase in cumulative explained variance within the next three components is lower than 4 percent. The starting samples are the two extreme samples (most negative and positive ones) of the important principal components.

per.n having a value of 0.4 while va equal to "F" chooses 40 percent of the sample set. When va is equal to "T" the validation set comprises 40 percent of the sample set.

A graph is given back showing the selected samples in the principal component space (only the important PC's). This is the same graphic generated by plot.ken.sto.

Value

ken.sto returns a list with class "ken.sto" containing the following components:

`Calibration and validation set`	the logical object `va`.
`Number important PC`	integer giving the number of chosen important components - important for choosing the starting samples.
`PC space important PC`	score value matrix of important principal components.
`Chosen samples names`	chosen sample names when `va` equal to `"F"`
`Chosen row number`	chosen row numbers when `va` equal to `"F"`
`Chosen calibration sample names`	chosen calibration sample names when `va` equal to `"T"`
`Chosen calibration row number`	chosen calibration row numbers when `va` equal to `"T"`
`Chosen validation sample names`	chosen validation sample names when `va` equal to `"T"`
`Chosen validation row number`	chosen validation row numbers when `va` equal to `"T"`

Author(s)

Thomas Terhoeven-Urselmans

References

Kennard, R. W. and Stone, L. A. (1969) Computer aided design of experiments. Technometrics 11(1), 137-148.

Examples

## Not run: ken.sto(inp, per = "T", per.n = 0.3, num, va = "F", sav = "T", path = "", out = "Sel")
## Not run: plot(ken.sto)(x,...)

[Package soil.spec version 0.2.0 Index]