Cargando…
Predicting reference soil groups using legacy data: A data pruning and Random Forest approach for tropical environment (Dano catchment, Burkina Faso)
Predicting taxonomic classes can be challenging with dataset subject to substantial irregularities due to the involvement of many surveyors. A data pruning approach was used in the present study to reduce such source errors by exploring whether different data pruning methods, which result in differe...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6028482/ https://www.ncbi.nlm.nih.gov/pubmed/29967391 http://dx.doi.org/10.1038/s41598-018-28244-w |
_version_ | 1783336772184309760 |
---|---|
author | Hounkpatin, Kpade O. L. Schmidt, Karsten Stumpf, Felix Forkuor, Gerald Behrens, Thorsten Scholten, Thomas Amelung, Wulf Welp, Gerhard |
author_facet | Hounkpatin, Kpade O. L. Schmidt, Karsten Stumpf, Felix Forkuor, Gerald Behrens, Thorsten Scholten, Thomas Amelung, Wulf Welp, Gerhard |
author_sort | Hounkpatin, Kpade O. L. |
collection | PubMed |
description | Predicting taxonomic classes can be challenging with dataset subject to substantial irregularities due to the involvement of many surveyors. A data pruning approach was used in the present study to reduce such source errors by exploring whether different data pruning methods, which result in different subsets of a major reference soil groups (RSG) – the Plinthosols – would lead to an increase in prediction accuracy of the minor soil groups by using Random Forest (RF). This method was compared to the random oversampling approach. Four datasets were used, including the entire dataset and the pruned dataset, which consisted of 80% and 90% respectively, and standard deviation core range of the Plinthosols data while cutting off all data points belonging to the outer range. The best prediction was achieved when RF was used with recursive feature elimination along with the non-oversampled 90% core range dataset. This model provided a substantial agreement to observation, with a kappa value of 0.57 along with 7% to 35% increase in prediction accuracy for smaller RSG. The reference soil groups in the Dano catchment appeared to be mainly influenced by the wetness index, a proxy for soil moisture distribution. |
format | Online Article Text |
id | pubmed-6028482 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-60284822018-07-09 Predicting reference soil groups using legacy data: A data pruning and Random Forest approach for tropical environment (Dano catchment, Burkina Faso) Hounkpatin, Kpade O. L. Schmidt, Karsten Stumpf, Felix Forkuor, Gerald Behrens, Thorsten Scholten, Thomas Amelung, Wulf Welp, Gerhard Sci Rep Article Predicting taxonomic classes can be challenging with dataset subject to substantial irregularities due to the involvement of many surveyors. A data pruning approach was used in the present study to reduce such source errors by exploring whether different data pruning methods, which result in different subsets of a major reference soil groups (RSG) – the Plinthosols – would lead to an increase in prediction accuracy of the minor soil groups by using Random Forest (RF). This method was compared to the random oversampling approach. Four datasets were used, including the entire dataset and the pruned dataset, which consisted of 80% and 90% respectively, and standard deviation core range of the Plinthosols data while cutting off all data points belonging to the outer range. The best prediction was achieved when RF was used with recursive feature elimination along with the non-oversampled 90% core range dataset. This model provided a substantial agreement to observation, with a kappa value of 0.57 along with 7% to 35% increase in prediction accuracy for smaller RSG. The reference soil groups in the Dano catchment appeared to be mainly influenced by the wetness index, a proxy for soil moisture distribution. Nature Publishing Group UK 2018-07-02 /pmc/articles/PMC6028482/ /pubmed/29967391 http://dx.doi.org/10.1038/s41598-018-28244-w Text en © The Author(s) 2018 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Hounkpatin, Kpade O. L. Schmidt, Karsten Stumpf, Felix Forkuor, Gerald Behrens, Thorsten Scholten, Thomas Amelung, Wulf Welp, Gerhard Predicting reference soil groups using legacy data: A data pruning and Random Forest approach for tropical environment (Dano catchment, Burkina Faso) |
title | Predicting reference soil groups using legacy data: A data pruning and Random Forest approach for tropical environment (Dano catchment, Burkina Faso) |
title_full | Predicting reference soil groups using legacy data: A data pruning and Random Forest approach for tropical environment (Dano catchment, Burkina Faso) |
title_fullStr | Predicting reference soil groups using legacy data: A data pruning and Random Forest approach for tropical environment (Dano catchment, Burkina Faso) |
title_full_unstemmed | Predicting reference soil groups using legacy data: A data pruning and Random Forest approach for tropical environment (Dano catchment, Burkina Faso) |
title_short | Predicting reference soil groups using legacy data: A data pruning and Random Forest approach for tropical environment (Dano catchment, Burkina Faso) |
title_sort | predicting reference soil groups using legacy data: a data pruning and random forest approach for tropical environment (dano catchment, burkina faso) |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6028482/ https://www.ncbi.nlm.nih.gov/pubmed/29967391 http://dx.doi.org/10.1038/s41598-018-28244-w |
work_keys_str_mv | AT hounkpatinkpadeol predictingreferencesoilgroupsusinglegacydataadatapruningandrandomforestapproachfortropicalenvironmentdanocatchmentburkinafaso AT schmidtkarsten predictingreferencesoilgroupsusinglegacydataadatapruningandrandomforestapproachfortropicalenvironmentdanocatchmentburkinafaso AT stumpffelix predictingreferencesoilgroupsusinglegacydataadatapruningandrandomforestapproachfortropicalenvironmentdanocatchmentburkinafaso AT forkuorgerald predictingreferencesoilgroupsusinglegacydataadatapruningandrandomforestapproachfortropicalenvironmentdanocatchmentburkinafaso AT behrensthorsten predictingreferencesoilgroupsusinglegacydataadatapruningandrandomforestapproachfortropicalenvironmentdanocatchmentburkinafaso AT scholtenthomas predictingreferencesoilgroupsusinglegacydataadatapruningandrandomforestapproachfortropicalenvironmentdanocatchmentburkinafaso AT amelungwulf predictingreferencesoilgroupsusinglegacydataadatapruningandrandomforestapproachfortropicalenvironmentdanocatchmentburkinafaso AT welpgerhard predictingreferencesoilgroupsusinglegacydataadatapruningandrandomforestapproachfortropicalenvironmentdanocatchmentburkinafaso |