Cargando…

The importance of data quality for generating reliable distribution models for rare, elusive, and cryptic species

The availability of spatially referenced environmental data and species occurrence records in online databases enable practitioners to easily generate species distribution models (SDMs) for a broad array of taxa. Such databases often include occurrence records of unknown reliability, yet little info...

Descripción completa

Detalles Bibliográficos
Autores principales: Aubry, Keith B., Raley, Catherine M., McKelvey, Kevin S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5480872/
https://www.ncbi.nlm.nih.gov/pubmed/28640819
http://dx.doi.org/10.1371/journal.pone.0179152
_version_ 1783245316756078592
author Aubry, Keith B.
Raley, Catherine M.
McKelvey, Kevin S.
author_facet Aubry, Keith B.
Raley, Catherine M.
McKelvey, Kevin S.
author_sort Aubry, Keith B.
collection PubMed
description The availability of spatially referenced environmental data and species occurrence records in online databases enable practitioners to easily generate species distribution models (SDMs) for a broad array of taxa. Such databases often include occurrence records of unknown reliability, yet little information is available on the influence of data quality on SDMs generated for rare, elusive, and cryptic species that are prone to misidentification in the field. We investigated this question for the fisher (Pekania pennanti), a forest carnivore of conservation concern in the Pacific States that is often confused with the more common Pacific marten (Martes caurina). Fisher occurrence records supported by physical evidence (verifiable records) were available from a limited area, whereas occurrence records of unknown quality (unscreened records) were available from throughout the fisher’s historical range. We reserved 20% of the verifiable records to use as a test sample for both models and generated SDMs with each dataset using Maxent. The verifiable model performed substantially better than the unscreened model based on multiple metrics including AUC(test) values (0.78 and 0.62, respectively), evaluation of training and test gains, and statistical tests of how well each model predicted test localities. In addition, the verifiable model was consistent with our knowledge of the fisher’s habitat relations and potential distribution, whereas the unscreened model indicated a much broader area of high-quality habitat (indices > 0.5) that included large expanses of high-elevation habitat that fishers do not occupy. Because Pacific martens remain relatively common in upper elevation habitats in the Cascade Range and Sierra Nevada, the SDM based on unscreened records likely reflects primarily a conflation of marten and fisher habitat. Consequently, accurate identifications are far more important than the spatial extent of occurrence records for generating reliable SDMs for the fisher in this region. We strongly recommend that practitioners avoid using anecdotal occurrence records to build SDMs but, if such data are used, the validity of resulting models should be tested with verifiable occurrence records.
format Online
Article
Text
id pubmed-5480872
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-54808722017-07-05 The importance of data quality for generating reliable distribution models for rare, elusive, and cryptic species Aubry, Keith B. Raley, Catherine M. McKelvey, Kevin S. PLoS One Research Article The availability of spatially referenced environmental data and species occurrence records in online databases enable practitioners to easily generate species distribution models (SDMs) for a broad array of taxa. Such databases often include occurrence records of unknown reliability, yet little information is available on the influence of data quality on SDMs generated for rare, elusive, and cryptic species that are prone to misidentification in the field. We investigated this question for the fisher (Pekania pennanti), a forest carnivore of conservation concern in the Pacific States that is often confused with the more common Pacific marten (Martes caurina). Fisher occurrence records supported by physical evidence (verifiable records) were available from a limited area, whereas occurrence records of unknown quality (unscreened records) were available from throughout the fisher’s historical range. We reserved 20% of the verifiable records to use as a test sample for both models and generated SDMs with each dataset using Maxent. The verifiable model performed substantially better than the unscreened model based on multiple metrics including AUC(test) values (0.78 and 0.62, respectively), evaluation of training and test gains, and statistical tests of how well each model predicted test localities. In addition, the verifiable model was consistent with our knowledge of the fisher’s habitat relations and potential distribution, whereas the unscreened model indicated a much broader area of high-quality habitat (indices > 0.5) that included large expanses of high-elevation habitat that fishers do not occupy. Because Pacific martens remain relatively common in upper elevation habitats in the Cascade Range and Sierra Nevada, the SDM based on unscreened records likely reflects primarily a conflation of marten and fisher habitat. Consequently, accurate identifications are far more important than the spatial extent of occurrence records for generating reliable SDMs for the fisher in this region. We strongly recommend that practitioners avoid using anecdotal occurrence records to build SDMs but, if such data are used, the validity of resulting models should be tested with verifiable occurrence records. Public Library of Science 2017-06-22 /pmc/articles/PMC5480872/ /pubmed/28640819 http://dx.doi.org/10.1371/journal.pone.0179152 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication.
spellingShingle Research Article
Aubry, Keith B.
Raley, Catherine M.
McKelvey, Kevin S.
The importance of data quality for generating reliable distribution models for rare, elusive, and cryptic species
title The importance of data quality for generating reliable distribution models for rare, elusive, and cryptic species
title_full The importance of data quality for generating reliable distribution models for rare, elusive, and cryptic species
title_fullStr The importance of data quality for generating reliable distribution models for rare, elusive, and cryptic species
title_full_unstemmed The importance of data quality for generating reliable distribution models for rare, elusive, and cryptic species
title_short The importance of data quality for generating reliable distribution models for rare, elusive, and cryptic species
title_sort importance of data quality for generating reliable distribution models for rare, elusive, and cryptic species
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5480872/
https://www.ncbi.nlm.nih.gov/pubmed/28640819
http://dx.doi.org/10.1371/journal.pone.0179152
work_keys_str_mv AT aubrykeithb theimportanceofdataqualityforgeneratingreliabledistributionmodelsforrareelusiveandcrypticspecies
AT raleycatherinem theimportanceofdataqualityforgeneratingreliabledistributionmodelsforrareelusiveandcrypticspecies
AT mckelveykevins theimportanceofdataqualityforgeneratingreliabledistributionmodelsforrareelusiveandcrypticspecies
AT aubrykeithb importanceofdataqualityforgeneratingreliabledistributionmodelsforrareelusiveandcrypticspecies
AT raleycatherinem importanceofdataqualityforgeneratingreliabledistributionmodelsforrareelusiveandcrypticspecies
AT mckelveykevins importanceofdataqualityforgeneratingreliabledistributionmodelsforrareelusiveandcrypticspecies