Cargando…

Location, location, location: utilizing pipelines and services to more effectively georeference the world's biodiversity data

BACKGROUND: Increasing the quantity and quality of data is a key goal of biodiversity informatics, leading to increased fitness for use in scientific research and beyond. This goal is impeded by a legacy of geographic locality descriptions associated with biodiversity records that are often heteroge...

Descripción completa

Detalles Bibliográficos
Autores principales: Hill, Andrew W, Guralnick, Robert, Flemons, Paul, Beaman, Reed, Wieczorek, John, Ranipeta, Ajay, Chavan, Vishwas, Remsen, David
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2775149/
https://www.ncbi.nlm.nih.gov/pubmed/19900299
http://dx.doi.org/10.1186/1471-2105-10-S14-S3
_version_ 1782173991739326464
author Hill, Andrew W
Guralnick, Robert
Flemons, Paul
Beaman, Reed
Wieczorek, John
Ranipeta, Ajay
Chavan, Vishwas
Remsen, David
author_facet Hill, Andrew W
Guralnick, Robert
Flemons, Paul
Beaman, Reed
Wieczorek, John
Ranipeta, Ajay
Chavan, Vishwas
Remsen, David
author_sort Hill, Andrew W
collection PubMed
description BACKGROUND: Increasing the quantity and quality of data is a key goal of biodiversity informatics, leading to increased fitness for use in scientific research and beyond. This goal is impeded by a legacy of geographic locality descriptions associated with biodiversity records that are often heterogeneous and not in a map-ready format. The biodiversity informatics community has developed best practices and tools that provide the means to do retrospective georeferencing (e.g., the BioGeomancer toolkit), a process that converts heterogeneous descriptions into geographic coordinates and a measurement of spatial uncertainty. Even with these methods and tools, data publishers are faced with the immensely time-consuming task of vetting georeferenced localities. Furthermore, it is likely that overlap in georeferencing effort is occurring across data publishers. Solutions are needed that help publishers more effectively georeference their records, verify their quality, and eliminate the duplication of effort across publishers. RESULTS: We have developed a tool called BioGeoBIF, which incorporates the high throughput and standardized georeferencing methods of BioGeomancer into a beginning-to-end workflow. Custodians who publish their data to the Global Biodiversity Information Facility (GBIF) can use this system to improve the quantity and quality of their georeferences. BioGeoBIF harvests records directly from the publishers' access points, georeferences the records using the BioGeomancer web-service, and makes results available to data managers for inclusion at the source. Using a web-based, password-protected, group management system for each data publisher, we leave data ownership, management, and vetting responsibilities with the managers and collaborators of each data set. We also minimize the georeferencing task, by combining and storing unique textual localities from all registered data access points, and dynamically linking that information to the password protected record information for each publisher. CONCLUSION: We have developed one of the first examples of services that can help create higher quality data for publishers mediated through the Global Biodiversity Information Facility and its data portal. This service is one step towards solving many problems of data quality in the growing field of biodiversity informatics. We envision future improvements to our service that include faster results returns and inclusion of more georeferencing engines.
format Text
id pubmed-2775149
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27751492009-11-10 Location, location, location: utilizing pipelines and services to more effectively georeference the world's biodiversity data Hill, Andrew W Guralnick, Robert Flemons, Paul Beaman, Reed Wieczorek, John Ranipeta, Ajay Chavan, Vishwas Remsen, David BMC Bioinformatics Research BACKGROUND: Increasing the quantity and quality of data is a key goal of biodiversity informatics, leading to increased fitness for use in scientific research and beyond. This goal is impeded by a legacy of geographic locality descriptions associated with biodiversity records that are often heterogeneous and not in a map-ready format. The biodiversity informatics community has developed best practices and tools that provide the means to do retrospective georeferencing (e.g., the BioGeomancer toolkit), a process that converts heterogeneous descriptions into geographic coordinates and a measurement of spatial uncertainty. Even with these methods and tools, data publishers are faced with the immensely time-consuming task of vetting georeferenced localities. Furthermore, it is likely that overlap in georeferencing effort is occurring across data publishers. Solutions are needed that help publishers more effectively georeference their records, verify their quality, and eliminate the duplication of effort across publishers. RESULTS: We have developed a tool called BioGeoBIF, which incorporates the high throughput and standardized georeferencing methods of BioGeomancer into a beginning-to-end workflow. Custodians who publish their data to the Global Biodiversity Information Facility (GBIF) can use this system to improve the quantity and quality of their georeferences. BioGeoBIF harvests records directly from the publishers' access points, georeferences the records using the BioGeomancer web-service, and makes results available to data managers for inclusion at the source. Using a web-based, password-protected, group management system for each data publisher, we leave data ownership, management, and vetting responsibilities with the managers and collaborators of each data set. We also minimize the georeferencing task, by combining and storing unique textual localities from all registered data access points, and dynamically linking that information to the password protected record information for each publisher. CONCLUSION: We have developed one of the first examples of services that can help create higher quality data for publishers mediated through the Global Biodiversity Information Facility and its data portal. This service is one step towards solving many problems of data quality in the growing field of biodiversity informatics. We envision future improvements to our service that include faster results returns and inclusion of more georeferencing engines. BioMed Central 2009-11-10 /pmc/articles/PMC2775149/ /pubmed/19900299 http://dx.doi.org/10.1186/1471-2105-10-S14-S3 Text en Copyright © 2009 Hill et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided th original work is properly cited.
spellingShingle Research
Hill, Andrew W
Guralnick, Robert
Flemons, Paul
Beaman, Reed
Wieczorek, John
Ranipeta, Ajay
Chavan, Vishwas
Remsen, David
Location, location, location: utilizing pipelines and services to more effectively georeference the world's biodiversity data
title Location, location, location: utilizing pipelines and services to more effectively georeference the world's biodiversity data
title_full Location, location, location: utilizing pipelines and services to more effectively georeference the world's biodiversity data
title_fullStr Location, location, location: utilizing pipelines and services to more effectively georeference the world's biodiversity data
title_full_unstemmed Location, location, location: utilizing pipelines and services to more effectively georeference the world's biodiversity data
title_short Location, location, location: utilizing pipelines and services to more effectively georeference the world's biodiversity data
title_sort location, location, location: utilizing pipelines and services to more effectively georeference the world's biodiversity data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2775149/
https://www.ncbi.nlm.nih.gov/pubmed/19900299
http://dx.doi.org/10.1186/1471-2105-10-S14-S3
work_keys_str_mv AT hillandreww locationlocationlocationutilizingpipelinesandservicestomoreeffectivelygeoreferencetheworldsbiodiversitydata
AT guralnickrobert locationlocationlocationutilizingpipelinesandservicestomoreeffectivelygeoreferencetheworldsbiodiversitydata
AT flemonspaul locationlocationlocationutilizingpipelinesandservicestomoreeffectivelygeoreferencetheworldsbiodiversitydata
AT beamanreed locationlocationlocationutilizingpipelinesandservicestomoreeffectivelygeoreferencetheworldsbiodiversitydata
AT wieczorekjohn locationlocationlocationutilizingpipelinesandservicestomoreeffectivelygeoreferencetheworldsbiodiversitydata
AT ranipetaajay locationlocationlocationutilizingpipelinesandservicestomoreeffectivelygeoreferencetheworldsbiodiversitydata
AT chavanvishwas locationlocationlocationutilizingpipelinesandservicestomoreeffectivelygeoreferencetheworldsbiodiversitydata
AT remsendavid locationlocationlocationutilizingpipelinesandservicestomoreeffectivelygeoreferencetheworldsbiodiversitydata