Cargando…

Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence

Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution and, more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mi, Chunrong, Huettmann, Falk, Guo, Yumin, Han, Xuesong, Wen, Lijia
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2017
Materias:	Biodiversity
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5237372/ https://www.ncbi.nlm.nih.gov/pubmed/28097060 http://dx.doi.org/10.7717/peerj.2849

_version_	1782495523634151424
author	Mi, Chunrong Huettmann, Falk Guo, Yumin Han, Xuesong Wen, Lijia
author_facet	Mi, Chunrong Huettmann, Falk Guo, Yumin Han, Xuesong Wen, Lijia
author_sort	Mi, Chunrong
collection	PubMed
description	Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution and, more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane (Grus monacha, n = 33), White-naped Crane (Grus vipio, n = 40), and Black-necked Crane (Grus nigricollis, n = 75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models). In addition, we developed an ensemble forecast by averaging predicted probability of the above four models results. Commonly used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. The latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years and has been known to perform extremely well in ecological predictions. However, while increasingly on the rise, its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and allows robust and rapid assessments and decisions for efficient conservation.
format	Online Article Text
id	pubmed-5237372
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-52373722017-01-17 Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence Mi, Chunrong Huettmann, Falk Guo, Yumin Han, Xuesong Wen, Lijia PeerJ Biodiversity Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution and, more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane (Grus monacha, n = 33), White-naped Crane (Grus vipio, n = 40), and Black-necked Crane (Grus nigricollis, n = 75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models). In addition, we developed an ensemble forecast by averaging predicted probability of the above four models results. Commonly used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. The latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years and has been known to perform extremely well in ecological predictions. However, while increasingly on the rise, its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and allows robust and rapid assessments and decisions for efficient conservation. PeerJ Inc. 2017-01-12 /pmc/articles/PMC5237372/ /pubmed/28097060 http://dx.doi.org/10.7717/peerj.2849 Text en ©2017 Mi et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle	Biodiversity Mi, Chunrong Huettmann, Falk Guo, Yumin Han, Xuesong Wen, Lijia Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence
title	Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence
title_full	Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence
title_fullStr	Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence
title_full_unstemmed	Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence
title_short	Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence
title_sort	why choose random forest to predict rare species distribution with few samples in large undersampled areas? three asian crane species models provide supporting evidence
topic	Biodiversity
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5237372/ https://www.ncbi.nlm.nih.gov/pubmed/28097060 http://dx.doi.org/10.7717/peerj.2849
work_keys_str_mv	AT michunrong whychooserandomforesttopredictrarespeciesdistributionwithfewsamplesinlargeundersampledareasthreeasiancranespeciesmodelsprovidesupportingevidence AT huettmannfalk whychooserandomforesttopredictrarespeciesdistributionwithfewsamplesinlargeundersampledareasthreeasiancranespeciesmodelsprovidesupportingevidence AT guoyumin whychooserandomforesttopredictrarespeciesdistributionwithfewsamplesinlargeundersampledareasthreeasiancranespeciesmodelsprovidesupportingevidence AT hanxuesong whychooserandomforesttopredictrarespeciesdistributionwithfewsamplesinlargeundersampledareasthreeasiancranespeciesmodelsprovidesupportingevidence AT wenlijia whychooserandomforesttopredictrarespeciesdistributionwithfewsamplesinlargeundersampledareasthreeasiancranespeciesmodelsprovidesupportingevidence

Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence

Ejemplares similares