Cargando…
A Comparative Assessment of the Influences of Human Impacts on Soil Cd Concentrations Based on Stepwise Linear Regression, Classification and Regression Tree, and Random Forest Models
Soil cadmium (Cd) contamination has attracted a great deal of attention because of its detrimental effects on animals and humans. This study aimed to develop and compare the performances of stepwise linear regression (SLR), classification and regression tree (CART) and random forest (RF) models in t...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4786095/ https://www.ncbi.nlm.nih.gov/pubmed/26964095 http://dx.doi.org/10.1371/journal.pone.0151131 |
_version_ | 1782420494466678784 |
---|---|
author | Qiu, Lefeng Wang, Kai Long, Wenli Wang, Ke Hu, Wei Amable, Gabriel S. |
author_facet | Qiu, Lefeng Wang, Kai Long, Wenli Wang, Ke Hu, Wei Amable, Gabriel S. |
author_sort | Qiu, Lefeng |
collection | PubMed |
description | Soil cadmium (Cd) contamination has attracted a great deal of attention because of its detrimental effects on animals and humans. This study aimed to develop and compare the performances of stepwise linear regression (SLR), classification and regression tree (CART) and random forest (RF) models in the prediction and mapping of the spatial distribution of soil Cd and to identify likely sources of Cd accumulation in Fuyang County, eastern China. Soil Cd data from 276 topsoil (0–20 cm) samples were collected and randomly divided into calibration (222 samples) and validation datasets (54 samples). Auxiliary data, including detailed land use information, soil organic matter, soil pH, and topographic data, were incorporated into the models to simulate the soil Cd concentrations and further identify the main factors influencing soil Cd variation. The predictive models for soil Cd concentration exhibited acceptable overall accuracies (72.22% for SLR, 70.37% for CART, and 75.93% for RF). The SLR model exhibited the largest predicted deviation, with a mean error (ME) of 0.074 mg/kg, a mean absolute error (MAE) of 0.160 mg/kg, and a root mean squared error (RMSE) of 0.274 mg/kg, and the RF model produced the results closest to the observed values, with an ME of 0.002 mg/kg, an MAE of 0.132 mg/kg, and an RMSE of 0.198 mg/kg. The RF model also exhibited the greatest R(2) value (0.772). The CART model predictions closely followed, with ME, MAE, RMSE, and R(2) values of 0.013 mg/kg, 0.154 mg/kg, 0.230 mg/kg and 0.644, respectively. The three prediction maps generally exhibited similar and realistic spatial patterns of soil Cd contamination. The heavily Cd-affected areas were primarily located in the alluvial valley plain of the Fuchun River and its tributaries because of the dramatic industrialization and urbanization processes that have occurred there. The most important variable for explaining high levels of soil Cd accumulation was the presence of metal smelting industries. The good performance of the RF model was attributable to its ability to handle the non-linear and hierarchical relationships between soil Cd and environmental variables. These results confirm that the RF approach is promising for the prediction and spatial distribution mapping of soil Cd at the regional scale. |
format | Online Article Text |
id | pubmed-4786095 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-47860952016-03-23 A Comparative Assessment of the Influences of Human Impacts on Soil Cd Concentrations Based on Stepwise Linear Regression, Classification and Regression Tree, and Random Forest Models Qiu, Lefeng Wang, Kai Long, Wenli Wang, Ke Hu, Wei Amable, Gabriel S. PLoS One Research Article Soil cadmium (Cd) contamination has attracted a great deal of attention because of its detrimental effects on animals and humans. This study aimed to develop and compare the performances of stepwise linear regression (SLR), classification and regression tree (CART) and random forest (RF) models in the prediction and mapping of the spatial distribution of soil Cd and to identify likely sources of Cd accumulation in Fuyang County, eastern China. Soil Cd data from 276 topsoil (0–20 cm) samples were collected and randomly divided into calibration (222 samples) and validation datasets (54 samples). Auxiliary data, including detailed land use information, soil organic matter, soil pH, and topographic data, were incorporated into the models to simulate the soil Cd concentrations and further identify the main factors influencing soil Cd variation. The predictive models for soil Cd concentration exhibited acceptable overall accuracies (72.22% for SLR, 70.37% for CART, and 75.93% for RF). The SLR model exhibited the largest predicted deviation, with a mean error (ME) of 0.074 mg/kg, a mean absolute error (MAE) of 0.160 mg/kg, and a root mean squared error (RMSE) of 0.274 mg/kg, and the RF model produced the results closest to the observed values, with an ME of 0.002 mg/kg, an MAE of 0.132 mg/kg, and an RMSE of 0.198 mg/kg. The RF model also exhibited the greatest R(2) value (0.772). The CART model predictions closely followed, with ME, MAE, RMSE, and R(2) values of 0.013 mg/kg, 0.154 mg/kg, 0.230 mg/kg and 0.644, respectively. The three prediction maps generally exhibited similar and realistic spatial patterns of soil Cd contamination. The heavily Cd-affected areas were primarily located in the alluvial valley plain of the Fuchun River and its tributaries because of the dramatic industrialization and urbanization processes that have occurred there. The most important variable for explaining high levels of soil Cd accumulation was the presence of metal smelting industries. The good performance of the RF model was attributable to its ability to handle the non-linear and hierarchical relationships between soil Cd and environmental variables. These results confirm that the RF approach is promising for the prediction and spatial distribution mapping of soil Cd at the regional scale. Public Library of Science 2016-03-10 /pmc/articles/PMC4786095/ /pubmed/26964095 http://dx.doi.org/10.1371/journal.pone.0151131 Text en © 2016 Qiu et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Qiu, Lefeng Wang, Kai Long, Wenli Wang, Ke Hu, Wei Amable, Gabriel S. A Comparative Assessment of the Influences of Human Impacts on Soil Cd Concentrations Based on Stepwise Linear Regression, Classification and Regression Tree, and Random Forest Models |
title | A Comparative Assessment of the Influences of Human Impacts on Soil Cd Concentrations Based on Stepwise Linear Regression, Classification and Regression Tree, and Random Forest Models |
title_full | A Comparative Assessment of the Influences of Human Impacts on Soil Cd Concentrations Based on Stepwise Linear Regression, Classification and Regression Tree, and Random Forest Models |
title_fullStr | A Comparative Assessment of the Influences of Human Impacts on Soil Cd Concentrations Based on Stepwise Linear Regression, Classification and Regression Tree, and Random Forest Models |
title_full_unstemmed | A Comparative Assessment of the Influences of Human Impacts on Soil Cd Concentrations Based on Stepwise Linear Regression, Classification and Regression Tree, and Random Forest Models |
title_short | A Comparative Assessment of the Influences of Human Impacts on Soil Cd Concentrations Based on Stepwise Linear Regression, Classification and Regression Tree, and Random Forest Models |
title_sort | comparative assessment of the influences of human impacts on soil cd concentrations based on stepwise linear regression, classification and regression tree, and random forest models |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4786095/ https://www.ncbi.nlm.nih.gov/pubmed/26964095 http://dx.doi.org/10.1371/journal.pone.0151131 |
work_keys_str_mv | AT qiulefeng acomparativeassessmentoftheinfluencesofhumanimpactsonsoilcdconcentrationsbasedonstepwiselinearregressionclassificationandregressiontreeandrandomforestmodels AT wangkai acomparativeassessmentoftheinfluencesofhumanimpactsonsoilcdconcentrationsbasedonstepwiselinearregressionclassificationandregressiontreeandrandomforestmodels AT longwenli acomparativeassessmentoftheinfluencesofhumanimpactsonsoilcdconcentrationsbasedonstepwiselinearregressionclassificationandregressiontreeandrandomforestmodels AT wangke acomparativeassessmentoftheinfluencesofhumanimpactsonsoilcdconcentrationsbasedonstepwiselinearregressionclassificationandregressiontreeandrandomforestmodels AT huwei acomparativeassessmentoftheinfluencesofhumanimpactsonsoilcdconcentrationsbasedonstepwiselinearregressionclassificationandregressiontreeandrandomforestmodels AT amablegabriels acomparativeassessmentoftheinfluencesofhumanimpactsonsoilcdconcentrationsbasedonstepwiselinearregressionclassificationandregressiontreeandrandomforestmodels AT qiulefeng comparativeassessmentoftheinfluencesofhumanimpactsonsoilcdconcentrationsbasedonstepwiselinearregressionclassificationandregressiontreeandrandomforestmodels AT wangkai comparativeassessmentoftheinfluencesofhumanimpactsonsoilcdconcentrationsbasedonstepwiselinearregressionclassificationandregressiontreeandrandomforestmodels AT longwenli comparativeassessmentoftheinfluencesofhumanimpactsonsoilcdconcentrationsbasedonstepwiselinearregressionclassificationandregressiontreeandrandomforestmodels AT wangke comparativeassessmentoftheinfluencesofhumanimpactsonsoilcdconcentrationsbasedonstepwiselinearregressionclassificationandregressiontreeandrandomforestmodels AT huwei comparativeassessmentoftheinfluencesofhumanimpactsonsoilcdconcentrationsbasedonstepwiselinearregressionclassificationandregressiontreeandrandomforestmodels AT amablegabriels comparativeassessmentoftheinfluencesofhumanimpactsonsoilcdconcentrationsbasedonstepwiselinearregressionclassificationandregressiontreeandrandomforestmodels |