Cargando…

The importance of digitized biocollections as a source of trait data and a new VertNet resource

For vast areas of the globe and large parts of the tree of life, data needed to inform trait diversity is incomplete. Such trait data, when fully assembled, however, form the link between the evolutionary history of organisms, their assembly into communities, and the nature and functioning of ecosys...

Descripción completa

Detalles Bibliográficos
Autores principales: Guralnick, Robert P., Zermoglio, Paula F., Wieczorek, John, LaFrance, Raphael, Bloom, David, Russell, Laura
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5199146/
https://www.ncbi.nlm.nih.gov/pubmed/28025346
http://dx.doi.org/10.1093/database/baw158
_version_ 1782488955765129216
author Guralnick, Robert P.
Zermoglio, Paula F.
Wieczorek, John
LaFrance, Raphael
Bloom, David
Russell, Laura
author_facet Guralnick, Robert P.
Zermoglio, Paula F.
Wieczorek, John
LaFrance, Raphael
Bloom, David
Russell, Laura
author_sort Guralnick, Robert P.
collection PubMed
description For vast areas of the globe and large parts of the tree of life, data needed to inform trait diversity is incomplete. Such trait data, when fully assembled, however, form the link between the evolutionary history of organisms, their assembly into communities, and the nature and functioning of ecosystems. Recent efforts to close data gaps have focused on collating trait-by-species databases, which only provide species-level, aggregated value ranges for traits of interest and often lack the direct observations on which those ranges are based. Perhaps under-appreciated is that digitized biocollection records collectively contain a vast trove of trait data measured directly from individuals, but this content remains hidden and highly heterogeneous, impeding discoverability and use. We developed and deployed a suite of openly accessible software tools in order to collate a full set of trait descriptions and extract two key traits, body length and mass, from >18 million specimen records in VertNet, a global biodiversity data publisher and aggregator. We tested success rate of these tools against hand-checked validation data sets and characterized quality and quantity. A post-processing toolkit was developed to standardize and harmonize data sets, and to integrate this improved content into VertNet for broadest reuse. The result of this work was to add more than 1.5 million harmonized measurements on vertebrate body mass and length directly to specimen records. Rates of false positives and negatives for extracted data were extremely low. We also created new tools for filtering, querying, and assembling this research-ready vertebrate trait content for view and download. Our work has yielded a novel database and platform for harmonized trait content that will grow as tools introduced here become part of publication workflows. We close by noting how this effort extends to new communities already developing similar digitized content. Database URL: http://portal.vertnet.org/search?advanced=1
format Online
Article
Text
id pubmed-5199146
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-51991462017-01-06 The importance of digitized biocollections as a source of trait data and a new VertNet resource Guralnick, Robert P. Zermoglio, Paula F. Wieczorek, John LaFrance, Raphael Bloom, David Russell, Laura Database (Oxford) Original Article For vast areas of the globe and large parts of the tree of life, data needed to inform trait diversity is incomplete. Such trait data, when fully assembled, however, form the link between the evolutionary history of organisms, their assembly into communities, and the nature and functioning of ecosystems. Recent efforts to close data gaps have focused on collating trait-by-species databases, which only provide species-level, aggregated value ranges for traits of interest and often lack the direct observations on which those ranges are based. Perhaps under-appreciated is that digitized biocollection records collectively contain a vast trove of trait data measured directly from individuals, but this content remains hidden and highly heterogeneous, impeding discoverability and use. We developed and deployed a suite of openly accessible software tools in order to collate a full set of trait descriptions and extract two key traits, body length and mass, from >18 million specimen records in VertNet, a global biodiversity data publisher and aggregator. We tested success rate of these tools against hand-checked validation data sets and characterized quality and quantity. A post-processing toolkit was developed to standardize and harmonize data sets, and to integrate this improved content into VertNet for broadest reuse. The result of this work was to add more than 1.5 million harmonized measurements on vertebrate body mass and length directly to specimen records. Rates of false positives and negatives for extracted data were extremely low. We also created new tools for filtering, querying, and assembling this research-ready vertebrate trait content for view and download. Our work has yielded a novel database and platform for harmonized trait content that will grow as tools introduced here become part of publication workflows. We close by noting how this effort extends to new communities already developing similar digitized content. Database URL: http://portal.vertnet.org/search?advanced=1 Oxford University Press 2016-12-26 /pmc/articles/PMC5199146/ /pubmed/28025346 http://dx.doi.org/10.1093/database/baw158 Text en © The Author(s) 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Guralnick, Robert P.
Zermoglio, Paula F.
Wieczorek, John
LaFrance, Raphael
Bloom, David
Russell, Laura
The importance of digitized biocollections as a source of trait data and a new VertNet resource
title The importance of digitized biocollections as a source of trait data and a new VertNet resource
title_full The importance of digitized biocollections as a source of trait data and a new VertNet resource
title_fullStr The importance of digitized biocollections as a source of trait data and a new VertNet resource
title_full_unstemmed The importance of digitized biocollections as a source of trait data and a new VertNet resource
title_short The importance of digitized biocollections as a source of trait data and a new VertNet resource
title_sort importance of digitized biocollections as a source of trait data and a new vertnet resource
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5199146/
https://www.ncbi.nlm.nih.gov/pubmed/28025346
http://dx.doi.org/10.1093/database/baw158
work_keys_str_mv AT guralnickrobertp theimportanceofdigitizedbiocollectionsasasourceoftraitdataandanewvertnetresource
AT zermogliopaulaf theimportanceofdigitizedbiocollectionsasasourceoftraitdataandanewvertnetresource
AT wieczorekjohn theimportanceofdigitizedbiocollectionsasasourceoftraitdataandanewvertnetresource
AT lafranceraphael theimportanceofdigitizedbiocollectionsasasourceoftraitdataandanewvertnetresource
AT bloomdavid theimportanceofdigitizedbiocollectionsasasourceoftraitdataandanewvertnetresource
AT russelllaura theimportanceofdigitizedbiocollectionsasasourceoftraitdataandanewvertnetresource
AT guralnickrobertp importanceofdigitizedbiocollectionsasasourceoftraitdataandanewvertnetresource
AT zermogliopaulaf importanceofdigitizedbiocollectionsasasourceoftraitdataandanewvertnetresource
AT wieczorekjohn importanceofdigitizedbiocollectionsasasourceoftraitdataandanewvertnetresource
AT lafranceraphael importanceofdigitizedbiocollectionsasasourceoftraitdataandanewvertnetresource
AT bloomdavid importanceofdigitizedbiocollectionsasasourceoftraitdataandanewvertnetresource
AT russelllaura importanceofdigitizedbiocollectionsasasourceoftraitdataandanewvertnetresource