Cargando…

Geographic name resolution service: A tool for the standardization and indexing of world political division names, with applications to species distribution modeling

Massive biological databases of species occurrences, or georeferenced locations where a species has been observed, are essential inputs for modeling present and future species distributions. Location accuracy is often assessed by determining whether the observation geocoordinates fall within the bou...

Descripción completa

Detalles Bibliográficos
Autores principales: Boyle, Bradley L., Maitner, Brian S., Barbosa, George G. C., Sajja, Rohith K., Feng, Xiao, Merow, Cory, Newman, Erica A., Park, Daniel S., Roehrdanz, Patrick R., Enquist, Brian J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9662723/
https://www.ncbi.nlm.nih.gov/pubmed/36374834
http://dx.doi.org/10.1371/journal.pone.0268162
_version_ 1784830728584822784
author Boyle, Bradley L.
Maitner, Brian S.
Barbosa, George G. C.
Sajja, Rohith K.
Feng, Xiao
Merow, Cory
Newman, Erica A.
Park, Daniel S.
Roehrdanz, Patrick R.
Enquist, Brian J.
author_facet Boyle, Bradley L.
Maitner, Brian S.
Barbosa, George G. C.
Sajja, Rohith K.
Feng, Xiao
Merow, Cory
Newman, Erica A.
Park, Daniel S.
Roehrdanz, Patrick R.
Enquist, Brian J.
author_sort Boyle, Bradley L.
collection PubMed
description Massive biological databases of species occurrences, or georeferenced locations where a species has been observed, are essential inputs for modeling present and future species distributions. Location accuracy is often assessed by determining whether the observation geocoordinates fall within the boundaries of the declared political divisions. This otherwise simple validation is complicated by the difficulty of matching political division names to the correct geospatial object. Spelling errors, abbreviations, alternative codes, and synonyms in multiple languages present daunting name disambiguation challenges. The inability to resolve political division names reduces usable data, and analysis of erroneous observations can lead to flawed results. Here, we present the Geographic Name Resolution Service (GNRS), an application for correcting, standardizing, and indexing world political division names. The GNRS resolves political division names against a reference database that combines names and codes from GeoNames with geospatial object identifiers from the Global Administrative Areas Database (GADM). In a trial resolution of political division names extracted from >270 million species occurrences, only 1.9%, representing just 6% of occurrences, matched exactly to GADM political divisions in their original form. The GNRS was able to resolve, completely or in part, 92% of the remaining 378,568 political division names, or 86% of the full biodiversity occurrence dataset. In assessing geocoordinate accuracy for >239 million species occurrences, resolution of political divisions by the GNRS enabled the detection of an order of magnitude more errors and an order of magnitude more error-free occurrences. By providing a novel solution to a significant data quality impediment, the GNRS liberates a tremendous amount of biodiversity data for quantitative biodiversity research. The GNRS runs as a web service and is accessible via an API, an R package, and a web-based graphical user interface. Its modular architecture is easily integrated into existing data validation workflows.
format Online
Article
Text
id pubmed-9662723
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-96627232022-11-15 Geographic name resolution service: A tool for the standardization and indexing of world political division names, with applications to species distribution modeling Boyle, Bradley L. Maitner, Brian S. Barbosa, George G. C. Sajja, Rohith K. Feng, Xiao Merow, Cory Newman, Erica A. Park, Daniel S. Roehrdanz, Patrick R. Enquist, Brian J. PLoS One Research Article Massive biological databases of species occurrences, or georeferenced locations where a species has been observed, are essential inputs for modeling present and future species distributions. Location accuracy is often assessed by determining whether the observation geocoordinates fall within the boundaries of the declared political divisions. This otherwise simple validation is complicated by the difficulty of matching political division names to the correct geospatial object. Spelling errors, abbreviations, alternative codes, and synonyms in multiple languages present daunting name disambiguation challenges. The inability to resolve political division names reduces usable data, and analysis of erroneous observations can lead to flawed results. Here, we present the Geographic Name Resolution Service (GNRS), an application for correcting, standardizing, and indexing world political division names. The GNRS resolves political division names against a reference database that combines names and codes from GeoNames with geospatial object identifiers from the Global Administrative Areas Database (GADM). In a trial resolution of political division names extracted from >270 million species occurrences, only 1.9%, representing just 6% of occurrences, matched exactly to GADM political divisions in their original form. The GNRS was able to resolve, completely or in part, 92% of the remaining 378,568 political division names, or 86% of the full biodiversity occurrence dataset. In assessing geocoordinate accuracy for >239 million species occurrences, resolution of political divisions by the GNRS enabled the detection of an order of magnitude more errors and an order of magnitude more error-free occurrences. By providing a novel solution to a significant data quality impediment, the GNRS liberates a tremendous amount of biodiversity data for quantitative biodiversity research. The GNRS runs as a web service and is accessible via an API, an R package, and a web-based graphical user interface. Its modular architecture is easily integrated into existing data validation workflows. Public Library of Science 2022-11-14 /pmc/articles/PMC9662723/ /pubmed/36374834 http://dx.doi.org/10.1371/journal.pone.0268162 Text en © 2022 Boyle et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Boyle, Bradley L.
Maitner, Brian S.
Barbosa, George G. C.
Sajja, Rohith K.
Feng, Xiao
Merow, Cory
Newman, Erica A.
Park, Daniel S.
Roehrdanz, Patrick R.
Enquist, Brian J.
Geographic name resolution service: A tool for the standardization and indexing of world political division names, with applications to species distribution modeling
title Geographic name resolution service: A tool for the standardization and indexing of world political division names, with applications to species distribution modeling
title_full Geographic name resolution service: A tool for the standardization and indexing of world political division names, with applications to species distribution modeling
title_fullStr Geographic name resolution service: A tool for the standardization and indexing of world political division names, with applications to species distribution modeling
title_full_unstemmed Geographic name resolution service: A tool for the standardization and indexing of world political division names, with applications to species distribution modeling
title_short Geographic name resolution service: A tool for the standardization and indexing of world political division names, with applications to species distribution modeling
title_sort geographic name resolution service: a tool for the standardization and indexing of world political division names, with applications to species distribution modeling
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9662723/
https://www.ncbi.nlm.nih.gov/pubmed/36374834
http://dx.doi.org/10.1371/journal.pone.0268162
work_keys_str_mv AT boylebradleyl geographicnameresolutionserviceatoolforthestandardizationandindexingofworldpoliticaldivisionnameswithapplicationstospeciesdistributionmodeling
AT maitnerbrians geographicnameresolutionserviceatoolforthestandardizationandindexingofworldpoliticaldivisionnameswithapplicationstospeciesdistributionmodeling
AT barbosageorgegc geographicnameresolutionserviceatoolforthestandardizationandindexingofworldpoliticaldivisionnameswithapplicationstospeciesdistributionmodeling
AT sajjarohithk geographicnameresolutionserviceatoolforthestandardizationandindexingofworldpoliticaldivisionnameswithapplicationstospeciesdistributionmodeling
AT fengxiao geographicnameresolutionserviceatoolforthestandardizationandindexingofworldpoliticaldivisionnameswithapplicationstospeciesdistributionmodeling
AT merowcory geographicnameresolutionserviceatoolforthestandardizationandindexingofworldpoliticaldivisionnameswithapplicationstospeciesdistributionmodeling
AT newmanericaa geographicnameresolutionserviceatoolforthestandardizationandindexingofworldpoliticaldivisionnameswithapplicationstospeciesdistributionmodeling
AT parkdaniels geographicnameresolutionserviceatoolforthestandardizationandindexingofworldpoliticaldivisionnameswithapplicationstospeciesdistributionmodeling
AT roehrdanzpatrickr geographicnameresolutionserviceatoolforthestandardizationandindexingofworldpoliticaldivisionnameswithapplicationstospeciesdistributionmodeling
AT enquistbrianj geographicnameresolutionserviceatoolforthestandardizationandindexingofworldpoliticaldivisionnameswithapplicationstospeciesdistributionmodeling