Cargando…

Quantifying the degree of bias from using county‐scale data in species distribution modeling: Can increasing sample size or using county‐averaged environmental data reduce distributional overprediction?

Citizen‐science databases have been used to develop species distribution models (SDMs), although many taxa may be only georeferenced to county. It is tacitly assumed that SDMs built from county‐scale data should be less precise than those built with more accurate localities, but the extent of the bi...

Descripción completa

Detalles Bibliográficos
Autores principales: Collins, Steven D., Abbott, John C., McIntyre, Nancy E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5551104/
https://www.ncbi.nlm.nih.gov/pubmed/28808561
http://dx.doi.org/10.1002/ece3.3115
_version_ 1783256242625445888
author Collins, Steven D.
Abbott, John C.
McIntyre, Nancy E.
author_facet Collins, Steven D.
Abbott, John C.
McIntyre, Nancy E.
author_sort Collins, Steven D.
collection PubMed
description Citizen‐science databases have been used to develop species distribution models (SDMs), although many taxa may be only georeferenced to county. It is tacitly assumed that SDMs built from county‐scale data should be less precise than those built with more accurate localities, but the extent of the bias is currently unknown. Our aims in this study were to illustrate the effects of using county‐scale data on the spatial extent and accuracy of SDMs relative to true locality data and to compare potential compensatory methods (including increased sample size and using overall county environmental averages rather than point locality environmental data). To do so, we developed SDMs in maxent with PRISM‐derived BIOCLIM parameters for 283 and 230 species of odonates (dragonflies and damselflies) and butterflies, respectively, for five subsets from the OdonataCentral and Butterflies and Moths of North America citizen‐science databases: (1) a true locality dataset, (2) a corresponding sister dataset of county‐centroid coordinates, (3) a dataset where the average environmental conditions within each county were assigned to each record, (4) a 50/50% mix of true localities and county‐centroid coordinates, and (5) a 50/50% mix of true localities and records assigned the average environmental conditions within each county. These mixtures allowed us to quantify the degree of bias from county‐scale data. Models developed with county centroids overpredicted the extent of suitable habitat by 15% on average compared to true locality models, although larger sample sizes (>100 locality records) reduced this disparity. Assigning county‐averaged environmental conditions did not offer consistent improvement, however. Because county‐level data are of limited value for developing SDMs except for species that are widespread and well collected or that inhabit regions where small, climatically uniform counties predominate, three means of encouraging more accurate georeferencing in citizen‐science databases are provided.
format Online
Article
Text
id pubmed-5551104
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-55511042017-08-14 Quantifying the degree of bias from using county‐scale data in species distribution modeling: Can increasing sample size or using county‐averaged environmental data reduce distributional overprediction? Collins, Steven D. Abbott, John C. McIntyre, Nancy E. Ecol Evol Original Research Citizen‐science databases have been used to develop species distribution models (SDMs), although many taxa may be only georeferenced to county. It is tacitly assumed that SDMs built from county‐scale data should be less precise than those built with more accurate localities, but the extent of the bias is currently unknown. Our aims in this study were to illustrate the effects of using county‐scale data on the spatial extent and accuracy of SDMs relative to true locality data and to compare potential compensatory methods (including increased sample size and using overall county environmental averages rather than point locality environmental data). To do so, we developed SDMs in maxent with PRISM‐derived BIOCLIM parameters for 283 and 230 species of odonates (dragonflies and damselflies) and butterflies, respectively, for five subsets from the OdonataCentral and Butterflies and Moths of North America citizen‐science databases: (1) a true locality dataset, (2) a corresponding sister dataset of county‐centroid coordinates, (3) a dataset where the average environmental conditions within each county were assigned to each record, (4) a 50/50% mix of true localities and county‐centroid coordinates, and (5) a 50/50% mix of true localities and records assigned the average environmental conditions within each county. These mixtures allowed us to quantify the degree of bias from county‐scale data. Models developed with county centroids overpredicted the extent of suitable habitat by 15% on average compared to true locality models, although larger sample sizes (>100 locality records) reduced this disparity. Assigning county‐averaged environmental conditions did not offer consistent improvement, however. Because county‐level data are of limited value for developing SDMs except for species that are widespread and well collected or that inhabit regions where small, climatically uniform counties predominate, three means of encouraging more accurate georeferencing in citizen‐science databases are provided. John Wiley and Sons Inc. 2017-06-28 /pmc/articles/PMC5551104/ /pubmed/28808561 http://dx.doi.org/10.1002/ece3.3115 Text en © 2017 The Authors. Ecology and Evolution published by John Wiley & Sons Ltd. This is an open access article under the terms of the Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Research
Collins, Steven D.
Abbott, John C.
McIntyre, Nancy E.
Quantifying the degree of bias from using county‐scale data in species distribution modeling: Can increasing sample size or using county‐averaged environmental data reduce distributional overprediction?
title Quantifying the degree of bias from using county‐scale data in species distribution modeling: Can increasing sample size or using county‐averaged environmental data reduce distributional overprediction?
title_full Quantifying the degree of bias from using county‐scale data in species distribution modeling: Can increasing sample size or using county‐averaged environmental data reduce distributional overprediction?
title_fullStr Quantifying the degree of bias from using county‐scale data in species distribution modeling: Can increasing sample size or using county‐averaged environmental data reduce distributional overprediction?
title_full_unstemmed Quantifying the degree of bias from using county‐scale data in species distribution modeling: Can increasing sample size or using county‐averaged environmental data reduce distributional overprediction?
title_short Quantifying the degree of bias from using county‐scale data in species distribution modeling: Can increasing sample size or using county‐averaged environmental data reduce distributional overprediction?
title_sort quantifying the degree of bias from using county‐scale data in species distribution modeling: can increasing sample size or using county‐averaged environmental data reduce distributional overprediction?
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5551104/
https://www.ncbi.nlm.nih.gov/pubmed/28808561
http://dx.doi.org/10.1002/ece3.3115
work_keys_str_mv AT collinsstevend quantifyingthedegreeofbiasfromusingcountyscaledatainspeciesdistributionmodelingcanincreasingsamplesizeorusingcountyaveragedenvironmentaldatareducedistributionaloverprediction
AT abbottjohnc quantifyingthedegreeofbiasfromusingcountyscaledatainspeciesdistributionmodelingcanincreasingsamplesizeorusingcountyaveragedenvironmentaldatareducedistributionaloverprediction
AT mcintyrenancye quantifyingthedegreeofbiasfromusingcountyscaledatainspeciesdistributionmodelingcanincreasingsamplesizeorusingcountyaveragedenvironmentaldatareducedistributionaloverprediction