Cargando…

Machine learning of large‐scale spatial distributions of wild turkeys with high‐dimensional environmental data

Species distribution modeling often involves high‐dimensional environmental data. Large amounts of data and multicollinearity among covariates impose challenges to statistical models in variable selection for reliable inferences of the effects of environmental factors on the spatial distribution of...

Descripción completa

Detalles Bibliográficos
Autores principales: Farrell, Annie, Wang, Guiming, Rush, Scott A., Martin, James A., Belant, Jerrold L., Butler, Adam B., Godwin, Dave
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6540709/
https://www.ncbi.nlm.nih.gov/pubmed/31161010
http://dx.doi.org/10.1002/ece3.5177
_version_ 1783422676381990912
author Farrell, Annie
Wang, Guiming
Rush, Scott A.
Martin, James A.
Belant, Jerrold L.
Butler, Adam B.
Godwin, Dave
author_facet Farrell, Annie
Wang, Guiming
Rush, Scott A.
Martin, James A.
Belant, Jerrold L.
Butler, Adam B.
Godwin, Dave
author_sort Farrell, Annie
collection PubMed
description Species distribution modeling often involves high‐dimensional environmental data. Large amounts of data and multicollinearity among covariates impose challenges to statistical models in variable selection for reliable inferences of the effects of environmental factors on the spatial distribution of species. Few studies have evaluated and compared the performance of multiple machine learning (ML) models in handling multicollinearity. Here, we assessed the effectiveness of removal of correlated covariates and regularization to cope with multicollinearity in ML models for habitat suitability. Three machine learning algorithms maximum entropy (MaxEnt), random forests (RFs), and support vector machines (SVMs) were applied to the original data (OD) of 27 landscape variables, reduced data (RD) with 14 highly correlated covariates being removed, and 15 principal components (PC) of the OD accounting for 90% of the original variability. The performance of the three ML models was measured with the area under the curve and continuous Boyce index. We collected 663 nonduplicated presence locations of Eastern wild turkeys (Meleagris gallopavo silvestris) across the state of Mississippi, United States. Of the total locations, 453 locations separated by a distance of ≥2 km were used to train the three ML algorithms on the OD, RD, and PC data, respectively. The remaining 210 locations were used to validate the trained ML models to measure ML performance. Three ML models had excellent performance on the RD and PC data. MaxEnt and SVMs had good performance on the OD data, indicating the adequacy of regularization of the default setting for multicollinearity. Weak learning of RFs through bagging appeared to alleviate multicollinearity and resulted in excellent performance on the OD data. Regularization of ML algorithms may help exploratory studies of the effects of environmental factors on the spatial distribution and habitat suitability of wildlife.
format Online
Article
Text
id pubmed-6540709
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-65407092019-06-03 Machine learning of large‐scale spatial distributions of wild turkeys with high‐dimensional environmental data Farrell, Annie Wang, Guiming Rush, Scott A. Martin, James A. Belant, Jerrold L. Butler, Adam B. Godwin, Dave Ecol Evol Original Research Species distribution modeling often involves high‐dimensional environmental data. Large amounts of data and multicollinearity among covariates impose challenges to statistical models in variable selection for reliable inferences of the effects of environmental factors on the spatial distribution of species. Few studies have evaluated and compared the performance of multiple machine learning (ML) models in handling multicollinearity. Here, we assessed the effectiveness of removal of correlated covariates and regularization to cope with multicollinearity in ML models for habitat suitability. Three machine learning algorithms maximum entropy (MaxEnt), random forests (RFs), and support vector machines (SVMs) were applied to the original data (OD) of 27 landscape variables, reduced data (RD) with 14 highly correlated covariates being removed, and 15 principal components (PC) of the OD accounting for 90% of the original variability. The performance of the three ML models was measured with the area under the curve and continuous Boyce index. We collected 663 nonduplicated presence locations of Eastern wild turkeys (Meleagris gallopavo silvestris) across the state of Mississippi, United States. Of the total locations, 453 locations separated by a distance of ≥2 km were used to train the three ML algorithms on the OD, RD, and PC data, respectively. The remaining 210 locations were used to validate the trained ML models to measure ML performance. Three ML models had excellent performance on the RD and PC data. MaxEnt and SVMs had good performance on the OD data, indicating the adequacy of regularization of the default setting for multicollinearity. Weak learning of RFs through bagging appeared to alleviate multicollinearity and resulted in excellent performance on the OD data. Regularization of ML algorithms may help exploratory studies of the effects of environmental factors on the spatial distribution and habitat suitability of wildlife. John Wiley and Sons Inc. 2019-04-24 /pmc/articles/PMC6540709/ /pubmed/31161010 http://dx.doi.org/10.1002/ece3.5177 Text en © 2019 The Authors. Ecology and Evolution published by John Wiley & Sons Ltd. This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Research
Farrell, Annie
Wang, Guiming
Rush, Scott A.
Martin, James A.
Belant, Jerrold L.
Butler, Adam B.
Godwin, Dave
Machine learning of large‐scale spatial distributions of wild turkeys with high‐dimensional environmental data
title Machine learning of large‐scale spatial distributions of wild turkeys with high‐dimensional environmental data
title_full Machine learning of large‐scale spatial distributions of wild turkeys with high‐dimensional environmental data
title_fullStr Machine learning of large‐scale spatial distributions of wild turkeys with high‐dimensional environmental data
title_full_unstemmed Machine learning of large‐scale spatial distributions of wild turkeys with high‐dimensional environmental data
title_short Machine learning of large‐scale spatial distributions of wild turkeys with high‐dimensional environmental data
title_sort machine learning of large‐scale spatial distributions of wild turkeys with high‐dimensional environmental data
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6540709/
https://www.ncbi.nlm.nih.gov/pubmed/31161010
http://dx.doi.org/10.1002/ece3.5177
work_keys_str_mv AT farrellannie machinelearningoflargescalespatialdistributionsofwildturkeyswithhighdimensionalenvironmentaldata
AT wangguiming machinelearningoflargescalespatialdistributionsofwildturkeyswithhighdimensionalenvironmentaldata
AT rushscotta machinelearningoflargescalespatialdistributionsofwildturkeyswithhighdimensionalenvironmentaldata
AT martinjamesa machinelearningoflargescalespatialdistributionsofwildturkeyswithhighdimensionalenvironmentaldata
AT belantjerroldl machinelearningoflargescalespatialdistributionsofwildturkeyswithhighdimensionalenvironmentaldata
AT butleradamb machinelearningoflargescalespatialdistributionsofwildturkeyswithhighdimensionalenvironmentaldata
AT godwindave machinelearningoflargescalespatialdistributionsofwildturkeyswithhighdimensionalenvironmentaldata