Cargando…

Predicting reference soil groups using legacy data: A data pruning and Random Forest approach for tropical environment (Dano catchment, Burkina Faso)

Predicting taxonomic classes can be challenging with dataset subject to substantial irregularities due to the involvement of many surveyors. A data pruning approach was used in the present study to reduce such source errors by exploring whether different data pruning methods, which result in differe...

Descripción completa

Detalles Bibliográficos
Autores principales: Hounkpatin, Kpade O. L., Schmidt, Karsten, Stumpf, Felix, Forkuor, Gerald, Behrens, Thorsten, Scholten, Thomas, Amelung, Wulf, Welp, Gerhard
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6028482/
https://www.ncbi.nlm.nih.gov/pubmed/29967391
http://dx.doi.org/10.1038/s41598-018-28244-w
_version_ 1783336772184309760
author Hounkpatin, Kpade O. L.
Schmidt, Karsten
Stumpf, Felix
Forkuor, Gerald
Behrens, Thorsten
Scholten, Thomas
Amelung, Wulf
Welp, Gerhard
author_facet Hounkpatin, Kpade O. L.
Schmidt, Karsten
Stumpf, Felix
Forkuor, Gerald
Behrens, Thorsten
Scholten, Thomas
Amelung, Wulf
Welp, Gerhard
author_sort Hounkpatin, Kpade O. L.
collection PubMed
description Predicting taxonomic classes can be challenging with dataset subject to substantial irregularities due to the involvement of many surveyors. A data pruning approach was used in the present study to reduce such source errors by exploring whether different data pruning methods, which result in different subsets of a major reference soil groups (RSG) – the Plinthosols – would lead to an increase in prediction accuracy of the minor soil groups by using Random Forest (RF). This method was compared to the random oversampling approach. Four datasets were used, including the entire dataset and the pruned dataset, which consisted of 80% and 90% respectively, and standard deviation core range of the Plinthosols data while cutting off all data points belonging to the outer range. The best prediction was achieved when RF was used with recursive feature elimination along with the non-oversampled 90% core range dataset. This model provided a substantial agreement to observation, with a kappa value of 0.57 along with 7% to 35% increase in prediction accuracy for smaller RSG. The reference soil groups in the Dano catchment appeared to be mainly influenced by the wetness index, a proxy for soil moisture distribution.
format Online
Article
Text
id pubmed-6028482
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-60284822018-07-09 Predicting reference soil groups using legacy data: A data pruning and Random Forest approach for tropical environment (Dano catchment, Burkina Faso) Hounkpatin, Kpade O. L. Schmidt, Karsten Stumpf, Felix Forkuor, Gerald Behrens, Thorsten Scholten, Thomas Amelung, Wulf Welp, Gerhard Sci Rep Article Predicting taxonomic classes can be challenging with dataset subject to substantial irregularities due to the involvement of many surveyors. A data pruning approach was used in the present study to reduce such source errors by exploring whether different data pruning methods, which result in different subsets of a major reference soil groups (RSG) – the Plinthosols – would lead to an increase in prediction accuracy of the minor soil groups by using Random Forest (RF). This method was compared to the random oversampling approach. Four datasets were used, including the entire dataset and the pruned dataset, which consisted of 80% and 90% respectively, and standard deviation core range of the Plinthosols data while cutting off all data points belonging to the outer range. The best prediction was achieved when RF was used with recursive feature elimination along with the non-oversampled 90% core range dataset. This model provided a substantial agreement to observation, with a kappa value of 0.57 along with 7% to 35% increase in prediction accuracy for smaller RSG. The reference soil groups in the Dano catchment appeared to be mainly influenced by the wetness index, a proxy for soil moisture distribution. Nature Publishing Group UK 2018-07-02 /pmc/articles/PMC6028482/ /pubmed/29967391 http://dx.doi.org/10.1038/s41598-018-28244-w Text en © The Author(s) 2018 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Hounkpatin, Kpade O. L.
Schmidt, Karsten
Stumpf, Felix
Forkuor, Gerald
Behrens, Thorsten
Scholten, Thomas
Amelung, Wulf
Welp, Gerhard
Predicting reference soil groups using legacy data: A data pruning and Random Forest approach for tropical environment (Dano catchment, Burkina Faso)
title Predicting reference soil groups using legacy data: A data pruning and Random Forest approach for tropical environment (Dano catchment, Burkina Faso)
title_full Predicting reference soil groups using legacy data: A data pruning and Random Forest approach for tropical environment (Dano catchment, Burkina Faso)
title_fullStr Predicting reference soil groups using legacy data: A data pruning and Random Forest approach for tropical environment (Dano catchment, Burkina Faso)
title_full_unstemmed Predicting reference soil groups using legacy data: A data pruning and Random Forest approach for tropical environment (Dano catchment, Burkina Faso)
title_short Predicting reference soil groups using legacy data: A data pruning and Random Forest approach for tropical environment (Dano catchment, Burkina Faso)
title_sort predicting reference soil groups using legacy data: a data pruning and random forest approach for tropical environment (dano catchment, burkina faso)
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6028482/
https://www.ncbi.nlm.nih.gov/pubmed/29967391
http://dx.doi.org/10.1038/s41598-018-28244-w
work_keys_str_mv AT hounkpatinkpadeol predictingreferencesoilgroupsusinglegacydataadatapruningandrandomforestapproachfortropicalenvironmentdanocatchmentburkinafaso
AT schmidtkarsten predictingreferencesoilgroupsusinglegacydataadatapruningandrandomforestapproachfortropicalenvironmentdanocatchmentburkinafaso
AT stumpffelix predictingreferencesoilgroupsusinglegacydataadatapruningandrandomforestapproachfortropicalenvironmentdanocatchmentburkinafaso
AT forkuorgerald predictingreferencesoilgroupsusinglegacydataadatapruningandrandomforestapproachfortropicalenvironmentdanocatchmentburkinafaso
AT behrensthorsten predictingreferencesoilgroupsusinglegacydataadatapruningandrandomforestapproachfortropicalenvironmentdanocatchmentburkinafaso
AT scholtenthomas predictingreferencesoilgroupsusinglegacydataadatapruningandrandomforestapproachfortropicalenvironmentdanocatchmentburkinafaso
AT amelungwulf predictingreferencesoilgroupsusinglegacydataadatapruningandrandomforestapproachfortropicalenvironmentdanocatchmentburkinafaso
AT welpgerhard predictingreferencesoilgroupsusinglegacydataadatapruningandrandomforestapproachfortropicalenvironmentdanocatchmentburkinafaso