Cargando…

In search of an optimum sampling algorithm for prediction of soil properties from infrared spectra

BACKGROUND: The use of visible-near infrared (vis-NIR) spectroscopy for rapid soil characterisation has gained a lot of interest in recent times. Soil spectra absorbance from the visible-infrared range can be calibrated using regression models to predict a set of soil properties. The accuracy of the...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ng, Wartini, Minasny, Budiman, Malone, Brendan, Filippi, Patrick
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2018
Materias:	Soil Science
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6173947/ https://www.ncbi.nlm.nih.gov/pubmed/30310751 http://dx.doi.org/10.7717/peerj.5722

_version_	1783361219977019392
author	Ng, Wartini Minasny, Budiman Malone, Brendan Filippi, Patrick
author_facet	Ng, Wartini Minasny, Budiman Malone, Brendan Filippi, Patrick
author_sort	Ng, Wartini
collection	PubMed
description	BACKGROUND: The use of visible-near infrared (vis-NIR) spectroscopy for rapid soil characterisation has gained a lot of interest in recent times. Soil spectra absorbance from the visible-infrared range can be calibrated using regression models to predict a set of soil properties. The accuracy of these regression models relies heavily on the calibration set. The optimum sample size and the overall sample representativeness of the dataset could further improve the model performance. However, there is no guideline on which sampling method should be used under different size of datasets. METHODS: Here, we show different sampling algorithms performed differently under different data size and different regression models (Cubist regression tree and Partial Least Square Regression (PLSR)). We analysed the effect of three sampling algorithms: Kennard-Stone (KS), conditioned Latin Hypercube Sampling (cLHS) and k-means clustering (KM) against random sampling on the prediction of up to five different soil properties (sand, clay, carbon content, cation exchange capacity and pH) on three datasets. These datasets have different coverages: a European continental dataset (LUCAS, n = 5,639), a regional dataset from Australia (Geeves, n = 379), and a local dataset from New South Wales, Australia (Hillston, n = 384). Calibration sample sizes ranging from 50 to 3,000 were derived and tested for the continental dataset; and from 50 to 200 samples for the regional and local datasets. RESULTS: Overall, the PLSR gives a better prediction in comparison to the Cubist model for the prediction of various soil properties. It is also less prone to the choice of sampling algorithm. The KM algorithm is more representative in the larger dataset up to a certain calibration sample size. The KS algorithm appears to be more efficient (as compared to random sampling) in small datasets; however, the prediction performance varied a lot between soil properties. The cLHS sampling algorithm is the most robust sampling method for multiple soil properties regardless of the sample size. DISCUSSION: Our results suggested that the optimum calibration sample size relied on how much generalization the model had to create. The use of the sampling algorithm is beneficial for larger datasets than smaller datasets where only small improvements can be made. KM is suitable for large datasets, KS is efficient in small datasets but results can be variable, while cLHS is less affected by sample size.
format	Online Article Text
id	pubmed-6173947
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-61739472018-10-11 In search of an optimum sampling algorithm for prediction of soil properties from infrared spectra Ng, Wartini Minasny, Budiman Malone, Brendan Filippi, Patrick PeerJ Soil Science BACKGROUND: The use of visible-near infrared (vis-NIR) spectroscopy for rapid soil characterisation has gained a lot of interest in recent times. Soil spectra absorbance from the visible-infrared range can be calibrated using regression models to predict a set of soil properties. The accuracy of these regression models relies heavily on the calibration set. The optimum sample size and the overall sample representativeness of the dataset could further improve the model performance. However, there is no guideline on which sampling method should be used under different size of datasets. METHODS: Here, we show different sampling algorithms performed differently under different data size and different regression models (Cubist regression tree and Partial Least Square Regression (PLSR)). We analysed the effect of three sampling algorithms: Kennard-Stone (KS), conditioned Latin Hypercube Sampling (cLHS) and k-means clustering (KM) against random sampling on the prediction of up to five different soil properties (sand, clay, carbon content, cation exchange capacity and pH) on three datasets. These datasets have different coverages: a European continental dataset (LUCAS, n = 5,639), a regional dataset from Australia (Geeves, n = 379), and a local dataset from New South Wales, Australia (Hillston, n = 384). Calibration sample sizes ranging from 50 to 3,000 were derived and tested for the continental dataset; and from 50 to 200 samples for the regional and local datasets. RESULTS: Overall, the PLSR gives a better prediction in comparison to the Cubist model for the prediction of various soil properties. It is also less prone to the choice of sampling algorithm. The KM algorithm is more representative in the larger dataset up to a certain calibration sample size. The KS algorithm appears to be more efficient (as compared to random sampling) in small datasets; however, the prediction performance varied a lot between soil properties. The cLHS sampling algorithm is the most robust sampling method for multiple soil properties regardless of the sample size. DISCUSSION: Our results suggested that the optimum calibration sample size relied on how much generalization the model had to create. The use of the sampling algorithm is beneficial for larger datasets than smaller datasets where only small improvements can be made. KM is suitable for large datasets, KS is efficient in small datasets but results can be variable, while cLHS is less affected by sample size. PeerJ Inc. 2018-10-03 /pmc/articles/PMC6173947/ /pubmed/30310751 http://dx.doi.org/10.7717/peerj.5722 Text en ©2018 Ng et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle	Soil Science Ng, Wartini Minasny, Budiman Malone, Brendan Filippi, Patrick In search of an optimum sampling algorithm for prediction of soil properties from infrared spectra
title	In search of an optimum sampling algorithm for prediction of soil properties from infrared spectra
title_full	In search of an optimum sampling algorithm for prediction of soil properties from infrared spectra
title_fullStr	In search of an optimum sampling algorithm for prediction of soil properties from infrared spectra
title_full_unstemmed	In search of an optimum sampling algorithm for prediction of soil properties from infrared spectra
title_short	In search of an optimum sampling algorithm for prediction of soil properties from infrared spectra
title_sort	in search of an optimum sampling algorithm for prediction of soil properties from infrared spectra
topic	Soil Science
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6173947/ https://www.ncbi.nlm.nih.gov/pubmed/30310751 http://dx.doi.org/10.7717/peerj.5722
work_keys_str_mv	AT ngwartini insearchofanoptimumsamplingalgorithmforpredictionofsoilpropertiesfrominfraredspectra AT minasnybudiman insearchofanoptimumsamplingalgorithmforpredictionofsoilpropertiesfrominfraredspectra AT malonebrendan insearchofanoptimumsamplingalgorithmforpredictionofsoilpropertiesfrominfraredspectra AT filippipatrick insearchofanoptimumsamplingalgorithmforpredictionofsoilpropertiesfrominfraredspectra

In search of an optimum sampling algorithm for prediction of soil properties from infrared spectra

Ejemplares similares