Cargando…

Data Leakage and Loss in Biodiversity Informatics

Abstract. The field of biodiversity informatics is in a massive, “grow-out” phase of creating and enabling large-scale biodiversity data resources. Because perhaps 90% of existing biodiversity data nonetheless remains unavailable for science and policy applications, the question arises as to how the...

Descripción completa

Detalles Bibliográficos
Autores principales: Peterson, A. Townsend, Asase, Alex, Canhos, Dora Ann Lange, de Souza, Sidnei, Wieczorek, John
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Pensoft Publishers 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6235996/
https://www.ncbi.nlm.nih.gov/pubmed/30473617
http://dx.doi.org/10.3897/BDJ.6.e26826
_version_ 1783370949701140480
author Peterson, A. Townsend
Asase, Alex
Canhos, Dora Ann Lange
de Souza, Sidnei
Wieczorek, John
author_facet Peterson, A. Townsend
Asase, Alex
Canhos, Dora Ann Lange
de Souza, Sidnei
Wieczorek, John
author_sort Peterson, A. Townsend
collection PubMed
description Abstract. The field of biodiversity informatics is in a massive, “grow-out” phase of creating and enabling large-scale biodiversity data resources. Because perhaps 90% of existing biodiversity data nonetheless remains unavailable for science and policy applications, the question arises as to how these existing and available data records can be mobilized most efficiently and effectively. This situation led to our analysis of several large-scale biodiversity datasets regarding birds and plants, detecting information gaps and documenting data “leakage” or attrition, in terms of data on taxon, time, and place, in each data record. We documented significant data leakage in each data dimension in each dataset. That is, significant numbers of data records are lacking crucial information in terms of taxon, time, and/or place; information on place was consistently the least complete, such that geographic referencing presently represents the most significant factor in degradation of usability of information from biodiversity information resources. Although the full process of digital capture, quality control, and enrichment is important to developing a complete digital record of existing biodiversity information, payoffs in terms of immediate data usability will be greatest with attention paid to the georeferencing challenge.
format Online
Article
Text
id pubmed-6235996
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Pensoft Publishers
record_format MEDLINE/PubMed
spelling pubmed-62359962018-11-23 Data Leakage and Loss in Biodiversity Informatics Peterson, A. Townsend Asase, Alex Canhos, Dora Ann Lange de Souza, Sidnei Wieczorek, John Biodivers Data J Research Article Abstract. The field of biodiversity informatics is in a massive, “grow-out” phase of creating and enabling large-scale biodiversity data resources. Because perhaps 90% of existing biodiversity data nonetheless remains unavailable for science and policy applications, the question arises as to how these existing and available data records can be mobilized most efficiently and effectively. This situation led to our analysis of several large-scale biodiversity datasets regarding birds and plants, detecting information gaps and documenting data “leakage” or attrition, in terms of data on taxon, time, and place, in each data record. We documented significant data leakage in each data dimension in each dataset. That is, significant numbers of data records are lacking crucial information in terms of taxon, time, and/or place; information on place was consistently the least complete, such that geographic referencing presently represents the most significant factor in degradation of usability of information from biodiversity information resources. Although the full process of digital capture, quality control, and enrichment is important to developing a complete digital record of existing biodiversity information, payoffs in terms of immediate data usability will be greatest with attention paid to the georeferencing challenge. Pensoft Publishers 2018-11-07 /pmc/articles/PMC6235996/ /pubmed/30473617 http://dx.doi.org/10.3897/BDJ.6.e26826 Text en A. Townsend Peterson, Alex Asase, Dora Canhos, Sidnei de Souza, John Wieczorek http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Peterson, A. Townsend
Asase, Alex
Canhos, Dora Ann Lange
de Souza, Sidnei
Wieczorek, John
Data Leakage and Loss in Biodiversity Informatics
title Data Leakage and Loss in Biodiversity Informatics
title_full Data Leakage and Loss in Biodiversity Informatics
title_fullStr Data Leakage and Loss in Biodiversity Informatics
title_full_unstemmed Data Leakage and Loss in Biodiversity Informatics
title_short Data Leakage and Loss in Biodiversity Informatics
title_sort data leakage and loss in biodiversity informatics
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6235996/
https://www.ncbi.nlm.nih.gov/pubmed/30473617
http://dx.doi.org/10.3897/BDJ.6.e26826
work_keys_str_mv AT petersonatownsend dataleakageandlossinbiodiversityinformatics
AT asasealex dataleakageandlossinbiodiversityinformatics
AT canhosdoraannlange dataleakageandlossinbiodiversityinformatics
AT desouzasidnei dataleakageandlossinbiodiversityinformatics
AT wieczorekjohn dataleakageandlossinbiodiversityinformatics