Cargando…
Data Leakage and Loss in Biodiversity Informatics
Abstract. The field of biodiversity informatics is in a massive, “grow-out” phase of creating and enabling large-scale biodiversity data resources. Because perhaps 90% of existing biodiversity data nonetheless remains unavailable for science and policy applications, the question arises as to how the...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Pensoft Publishers
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6235996/ https://www.ncbi.nlm.nih.gov/pubmed/30473617 http://dx.doi.org/10.3897/BDJ.6.e26826 |
_version_ | 1783370949701140480 |
---|---|
author | Peterson, A. Townsend Asase, Alex Canhos, Dora Ann Lange de Souza, Sidnei Wieczorek, John |
author_facet | Peterson, A. Townsend Asase, Alex Canhos, Dora Ann Lange de Souza, Sidnei Wieczorek, John |
author_sort | Peterson, A. Townsend |
collection | PubMed |
description | Abstract. The field of biodiversity informatics is in a massive, “grow-out” phase of creating and enabling large-scale biodiversity data resources. Because perhaps 90% of existing biodiversity data nonetheless remains unavailable for science and policy applications, the question arises as to how these existing and available data records can be mobilized most efficiently and effectively. This situation led to our analysis of several large-scale biodiversity datasets regarding birds and plants, detecting information gaps and documenting data “leakage” or attrition, in terms of data on taxon, time, and place, in each data record. We documented significant data leakage in each data dimension in each dataset. That is, significant numbers of data records are lacking crucial information in terms of taxon, time, and/or place; information on place was consistently the least complete, such that geographic referencing presently represents the most significant factor in degradation of usability of information from biodiversity information resources. Although the full process of digital capture, quality control, and enrichment is important to developing a complete digital record of existing biodiversity information, payoffs in terms of immediate data usability will be greatest with attention paid to the georeferencing challenge. |
format | Online Article Text |
id | pubmed-6235996 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Pensoft Publishers |
record_format | MEDLINE/PubMed |
spelling | pubmed-62359962018-11-23 Data Leakage and Loss in Biodiversity Informatics Peterson, A. Townsend Asase, Alex Canhos, Dora Ann Lange de Souza, Sidnei Wieczorek, John Biodivers Data J Research Article Abstract. The field of biodiversity informatics is in a massive, “grow-out” phase of creating and enabling large-scale biodiversity data resources. Because perhaps 90% of existing biodiversity data nonetheless remains unavailable for science and policy applications, the question arises as to how these existing and available data records can be mobilized most efficiently and effectively. This situation led to our analysis of several large-scale biodiversity datasets regarding birds and plants, detecting information gaps and documenting data “leakage” or attrition, in terms of data on taxon, time, and place, in each data record. We documented significant data leakage in each data dimension in each dataset. That is, significant numbers of data records are lacking crucial information in terms of taxon, time, and/or place; information on place was consistently the least complete, such that geographic referencing presently represents the most significant factor in degradation of usability of information from biodiversity information resources. Although the full process of digital capture, quality control, and enrichment is important to developing a complete digital record of existing biodiversity information, payoffs in terms of immediate data usability will be greatest with attention paid to the georeferencing challenge. Pensoft Publishers 2018-11-07 /pmc/articles/PMC6235996/ /pubmed/30473617 http://dx.doi.org/10.3897/BDJ.6.e26826 Text en A. Townsend Peterson, Alex Asase, Dora Canhos, Sidnei de Souza, John Wieczorek http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Peterson, A. Townsend Asase, Alex Canhos, Dora Ann Lange de Souza, Sidnei Wieczorek, John Data Leakage and Loss in Biodiversity Informatics |
title | Data Leakage and Loss in Biodiversity Informatics |
title_full | Data Leakage and Loss in Biodiversity Informatics |
title_fullStr | Data Leakage and Loss in Biodiversity Informatics |
title_full_unstemmed | Data Leakage and Loss in Biodiversity Informatics |
title_short | Data Leakage and Loss in Biodiversity Informatics |
title_sort | data leakage and loss in biodiversity informatics |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6235996/ https://www.ncbi.nlm.nih.gov/pubmed/30473617 http://dx.doi.org/10.3897/BDJ.6.e26826 |
work_keys_str_mv | AT petersonatownsend dataleakageandlossinbiodiversityinformatics AT asasealex dataleakageandlossinbiodiversityinformatics AT canhosdoraannlange dataleakageandlossinbiodiversityinformatics AT desouzasidnei dataleakageandlossinbiodiversityinformatics AT wieczorekjohn dataleakageandlossinbiodiversityinformatics |