Cargando…

From GenBank to GBIF: Phylogeny-Based Predictive Niche Modeling Tests Accuracy of Taxonomic Identifications in Large Occurrence Data Repositories

Accuracy of taxonomic identifications is crucial to data quality in online repositories of species occurrence data, such as the Global Biodiversity Information Facility (GBIF), which have accumulated several hundred million records over the past 15 years. These data serve as basis for large scale an...

Descripción completa

Detalles Bibliográficos
Autores principales: Smith, B. Eugene, Johnston, Mark K., Lücking, Robert
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4788202/
https://www.ncbi.nlm.nih.gov/pubmed/26967999
http://dx.doi.org/10.1371/journal.pone.0151232
_version_ 1782420707763814400
author Smith, B. Eugene
Johnston, Mark K.
Lücking, Robert
author_facet Smith, B. Eugene
Johnston, Mark K.
Lücking, Robert
author_sort Smith, B. Eugene
collection PubMed
description Accuracy of taxonomic identifications is crucial to data quality in online repositories of species occurrence data, such as the Global Biodiversity Information Facility (GBIF), which have accumulated several hundred million records over the past 15 years. These data serve as basis for large scale analyses of macroecological and biogeographic patterns and to document environmental changes over time. However, taxonomic identifications are often unreliable, especially for non-vascular plants and fungi including lichens, which may lack critical revisions of voucher specimens. Due to the scale of the problem, restudy of millions of collections is unrealistic and other strategies are needed. Here we propose to use verified, georeferenced occurrence data of a given species to apply predictive niche modeling that can then be used to evaluate unverified occurrences of that species. Selecting the charismatic lichen fungus, Usnea longissima, as a case study, we used georeferenced occurrence records based on sequenced specimens to model its predicted niche. Our results suggest that the target species is largely restricted to a narrow range of boreal and temperate forest in the Northern Hemisphere and that occurrence records in GBIF from tropical regions and the Southern Hemisphere do not represent this taxon, a prediction tested by comparison with taxonomic revisions of Usnea for these regions. As a novel approach, we employed Principal Component Analysis on the environmental grid data used for predictive modeling to visualize potential ecogeographical barriers for the target species; we found that tropical regions conform a strong barrier, explaining why potential niches in the Southern Hemisphere were not colonized by Usnea longissima and instead by morphologically similar species. This approach is an example of how data from two of the most important biodiversity repositories, GenBank and GBIF, can be effectively combined to remotely address the problem of inaccuracy of taxonomic identifications in occurrence data repositories and to provide a filtering mechanism which can considerably reduce the number of voucher specimens that need critical revision, in this case from 4,672 to about 100.
format Online
Article
Text
id pubmed-4788202
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-47882022016-03-23 From GenBank to GBIF: Phylogeny-Based Predictive Niche Modeling Tests Accuracy of Taxonomic Identifications in Large Occurrence Data Repositories Smith, B. Eugene Johnston, Mark K. Lücking, Robert PLoS One Research Article Accuracy of taxonomic identifications is crucial to data quality in online repositories of species occurrence data, such as the Global Biodiversity Information Facility (GBIF), which have accumulated several hundred million records over the past 15 years. These data serve as basis for large scale analyses of macroecological and biogeographic patterns and to document environmental changes over time. However, taxonomic identifications are often unreliable, especially for non-vascular plants and fungi including lichens, which may lack critical revisions of voucher specimens. Due to the scale of the problem, restudy of millions of collections is unrealistic and other strategies are needed. Here we propose to use verified, georeferenced occurrence data of a given species to apply predictive niche modeling that can then be used to evaluate unverified occurrences of that species. Selecting the charismatic lichen fungus, Usnea longissima, as a case study, we used georeferenced occurrence records based on sequenced specimens to model its predicted niche. Our results suggest that the target species is largely restricted to a narrow range of boreal and temperate forest in the Northern Hemisphere and that occurrence records in GBIF from tropical regions and the Southern Hemisphere do not represent this taxon, a prediction tested by comparison with taxonomic revisions of Usnea for these regions. As a novel approach, we employed Principal Component Analysis on the environmental grid data used for predictive modeling to visualize potential ecogeographical barriers for the target species; we found that tropical regions conform a strong barrier, explaining why potential niches in the Southern Hemisphere were not colonized by Usnea longissima and instead by morphologically similar species. This approach is an example of how data from two of the most important biodiversity repositories, GenBank and GBIF, can be effectively combined to remotely address the problem of inaccuracy of taxonomic identifications in occurrence data repositories and to provide a filtering mechanism which can considerably reduce the number of voucher specimens that need critical revision, in this case from 4,672 to about 100. Public Library of Science 2016-03-11 /pmc/articles/PMC4788202/ /pubmed/26967999 http://dx.doi.org/10.1371/journal.pone.0151232 Text en © 2016 Smith et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Smith, B. Eugene
Johnston, Mark K.
Lücking, Robert
From GenBank to GBIF: Phylogeny-Based Predictive Niche Modeling Tests Accuracy of Taxonomic Identifications in Large Occurrence Data Repositories
title From GenBank to GBIF: Phylogeny-Based Predictive Niche Modeling Tests Accuracy of Taxonomic Identifications in Large Occurrence Data Repositories
title_full From GenBank to GBIF: Phylogeny-Based Predictive Niche Modeling Tests Accuracy of Taxonomic Identifications in Large Occurrence Data Repositories
title_fullStr From GenBank to GBIF: Phylogeny-Based Predictive Niche Modeling Tests Accuracy of Taxonomic Identifications in Large Occurrence Data Repositories
title_full_unstemmed From GenBank to GBIF: Phylogeny-Based Predictive Niche Modeling Tests Accuracy of Taxonomic Identifications in Large Occurrence Data Repositories
title_short From GenBank to GBIF: Phylogeny-Based Predictive Niche Modeling Tests Accuracy of Taxonomic Identifications in Large Occurrence Data Repositories
title_sort from genbank to gbif: phylogeny-based predictive niche modeling tests accuracy of taxonomic identifications in large occurrence data repositories
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4788202/
https://www.ncbi.nlm.nih.gov/pubmed/26967999
http://dx.doi.org/10.1371/journal.pone.0151232
work_keys_str_mv AT smithbeugene fromgenbanktogbifphylogenybasedpredictivenichemodelingtestsaccuracyoftaxonomicidentificationsinlargeoccurrencedatarepositories
AT johnstonmarkk fromgenbanktogbifphylogenybasedpredictivenichemodelingtestsaccuracyoftaxonomicidentificationsinlargeoccurrencedatarepositories
AT luckingrobert fromgenbanktogbifphylogenybasedpredictivenichemodelingtestsaccuracyoftaxonomicidentificationsinlargeoccurrencedatarepositories