Cargando…

The use of classification and regression algorithms using the random forests method with presence-only data to model species’ distribution

Random forests (RF) is a powerful species distribution model (SDM) algorithm. This ensemble model by default can produce categorical and numerical species distribution maps based on its classification tree (CT) and regression tree (RT) algorithms, respectively. The CT algorithm can also produce nume...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Lei, Huettmann, Falk, Zhang, Xudong, Liu, Shirong, Sun, Pengsen, Yu, Zhen, Mi, Chunrong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6812352/
https://www.ncbi.nlm.nih.gov/pubmed/31667128
http://dx.doi.org/10.1016/j.mex.2019.09.035
_version_ 1783462644238254080
author Zhang, Lei
Huettmann, Falk
Zhang, Xudong
Liu, Shirong
Sun, Pengsen
Yu, Zhen
Mi, Chunrong
author_facet Zhang, Lei
Huettmann, Falk
Zhang, Xudong
Liu, Shirong
Sun, Pengsen
Yu, Zhen
Mi, Chunrong
author_sort Zhang, Lei
collection PubMed
description Random forests (RF) is a powerful species distribution model (SDM) algorithm. This ensemble model by default can produce categorical and numerical species distribution maps based on its classification tree (CT) and regression tree (RT) algorithms, respectively. The CT algorithm can also produce numerical predictions (class probability). Here, we present a detailed procedure involving the use of the CT and RT algorithms using the RF method with presence-only data to model the distribution of species. CT and RT are used to generate numerical prediction maps, and then numerical predictions are converted to binary predictions through objective threshold-setting methods. We also applied simple methods to deal with collinearity of predictor variables and spatial autocorrelation of species occurrence data. A geographically stratified sampling method was employed for generating pseudo-absences. The detailed procedural framework is meant to be a generic method to be applied to virtually any SDM prediction question using presence-only data. • How to use RF as a standard method for generic species distributions with presence-only data; • How to choose RF (CT or RT) methods for the distribution modeling of species; • A general and detailed procedure for any SDM prediction question.
format Online
Article
Text
id pubmed-6812352
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-68123522019-10-30 The use of classification and regression algorithms using the random forests method with presence-only data to model species’ distribution Zhang, Lei Huettmann, Falk Zhang, Xudong Liu, Shirong Sun, Pengsen Yu, Zhen Mi, Chunrong MethodsX Environmental Science Random forests (RF) is a powerful species distribution model (SDM) algorithm. This ensemble model by default can produce categorical and numerical species distribution maps based on its classification tree (CT) and regression tree (RT) algorithms, respectively. The CT algorithm can also produce numerical predictions (class probability). Here, we present a detailed procedure involving the use of the CT and RT algorithms using the RF method with presence-only data to model the distribution of species. CT and RT are used to generate numerical prediction maps, and then numerical predictions are converted to binary predictions through objective threshold-setting methods. We also applied simple methods to deal with collinearity of predictor variables and spatial autocorrelation of species occurrence data. A geographically stratified sampling method was employed for generating pseudo-absences. The detailed procedural framework is meant to be a generic method to be applied to virtually any SDM prediction question using presence-only data. • How to use RF as a standard method for generic species distributions with presence-only data; • How to choose RF (CT or RT) methods for the distribution modeling of species; • A general and detailed procedure for any SDM prediction question. Elsevier 2019-09-28 /pmc/articles/PMC6812352/ /pubmed/31667128 http://dx.doi.org/10.1016/j.mex.2019.09.035 Text en © 2019 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Environmental Science
Zhang, Lei
Huettmann, Falk
Zhang, Xudong
Liu, Shirong
Sun, Pengsen
Yu, Zhen
Mi, Chunrong
The use of classification and regression algorithms using the random forests method with presence-only data to model species’ distribution
title The use of classification and regression algorithms using the random forests method with presence-only data to model species’ distribution
title_full The use of classification and regression algorithms using the random forests method with presence-only data to model species’ distribution
title_fullStr The use of classification and regression algorithms using the random forests method with presence-only data to model species’ distribution
title_full_unstemmed The use of classification and regression algorithms using the random forests method with presence-only data to model species’ distribution
title_short The use of classification and regression algorithms using the random forests method with presence-only data to model species’ distribution
title_sort use of classification and regression algorithms using the random forests method with presence-only data to model species’ distribution
topic Environmental Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6812352/
https://www.ncbi.nlm.nih.gov/pubmed/31667128
http://dx.doi.org/10.1016/j.mex.2019.09.035
work_keys_str_mv AT zhanglei theuseofclassificationandregressionalgorithmsusingtherandomforestsmethodwithpresenceonlydatatomodelspeciesdistribution
AT huettmannfalk theuseofclassificationandregressionalgorithmsusingtherandomforestsmethodwithpresenceonlydatatomodelspeciesdistribution
AT zhangxudong theuseofclassificationandregressionalgorithmsusingtherandomforestsmethodwithpresenceonlydatatomodelspeciesdistribution
AT liushirong theuseofclassificationandregressionalgorithmsusingtherandomforestsmethodwithpresenceonlydatatomodelspeciesdistribution
AT sunpengsen theuseofclassificationandregressionalgorithmsusingtherandomforestsmethodwithpresenceonlydatatomodelspeciesdistribution
AT yuzhen theuseofclassificationandregressionalgorithmsusingtherandomforestsmethodwithpresenceonlydatatomodelspeciesdistribution
AT michunrong theuseofclassificationandregressionalgorithmsusingtherandomforestsmethodwithpresenceonlydatatomodelspeciesdistribution
AT zhanglei useofclassificationandregressionalgorithmsusingtherandomforestsmethodwithpresenceonlydatatomodelspeciesdistribution
AT huettmannfalk useofclassificationandregressionalgorithmsusingtherandomforestsmethodwithpresenceonlydatatomodelspeciesdistribution
AT zhangxudong useofclassificationandregressionalgorithmsusingtherandomforestsmethodwithpresenceonlydatatomodelspeciesdistribution
AT liushirong useofclassificationandregressionalgorithmsusingtherandomforestsmethodwithpresenceonlydatatomodelspeciesdistribution
AT sunpengsen useofclassificationandregressionalgorithmsusingtherandomforestsmethodwithpresenceonlydatatomodelspeciesdistribution
AT yuzhen useofclassificationandregressionalgorithmsusingtherandomforestsmethodwithpresenceonlydatatomodelspeciesdistribution
AT michunrong useofclassificationandregressionalgorithmsusingtherandomforestsmethodwithpresenceonlydatatomodelspeciesdistribution