Cargando…

Integrating and visualizing primary data from prospective and legacy taxonomic literature

Abstract. Specimen data in taxonomic literature are among the highest quality primary biodiversity data. Innovative cybertaxonomic journals are using workflows that maintain data structure and disseminate electronic content to aggregators and other users; such structure is lost in traditional taxono...

Descripción completa

Detalles Bibliográficos
Autores principales:	Miller, Jeremy A., Agosti, Donat, Penev, Lyubomir, Sautter, Guido, Georgiev, Teodor, Catapano, Terry, Patterson, David, King, David, Pereira, Serrano, Vos, Rutger Aldo, Sierra, Soraya
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Pensoft Publishers 2015
Materias:	General Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4442254/ https://www.ncbi.nlm.nih.gov/pubmed/26023286 http://dx.doi.org/10.3897/BDJ.3.e5063

_version_	1782372878467989504
author	Miller, Jeremy A. Agosti, Donat Penev, Lyubomir Sautter, Guido Georgiev, Teodor Catapano, Terry Patterson, David King, David Pereira, Serrano Vos, Rutger Aldo Sierra, Soraya
author_facet	Miller, Jeremy A. Agosti, Donat Penev, Lyubomir Sautter, Guido Georgiev, Teodor Catapano, Terry Patterson, David King, David Pereira, Serrano Vos, Rutger Aldo Sierra, Soraya
author_sort	Miller, Jeremy A.
collection	PubMed
description	Abstract. Specimen data in taxonomic literature are among the highest quality primary biodiversity data. Innovative cybertaxonomic journals are using workflows that maintain data structure and disseminate electronic content to aggregators and other users; such structure is lost in traditional taxonomic publishing. Legacy taxonomic literature is a vast repository of knowledge about biodiversity. Currently, access to that resource is cumbersome, especially for non-specialist data consumers. Markup is a mechanism that makes this content more accessible, and is especially suited to machine analysis. Fine-grained XML (Extensible Markup Language) markup was applied to all (37) open-access articles published in the journal Zootaxa containing treatments on spiders (Order: Araneae). The markup approach was optimized to extract primary specimen data from legacy publications. These data were combined with data from articles containing treatments on spiders published in Biodiversity Data Journal where XML structure is part of the routine publication process. A series of charts was developed to visualize the content of specimen data in XML-tagged taxonomic treatments, either singly or in aggregate. The data can be filtered by several fields (including journal, taxon, institutional collection, collecting country, collector, author, article and treatment) to query particular aspects of the data. We demonstrate here that XML markup using GoldenGATE can address the challenge presented by unstructured legacy data, can extract structured primary biodiversity data which can be aggregated with and jointly queried with data from other Darwin Core-compatible sources, and show how visualization of these data can communicate key information contained in biodiversity literature. We complement recent studies on aspects of biodiversity knowledge using XML structured data to explore 1) the time lag between species discovry and description, and 2) the prevelence of rarity in species descriptions.
format	Online Article Text
id	pubmed-4442254
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Pensoft Publishers
record_format	MEDLINE/PubMed
spelling	pubmed-44422542015-05-28 Integrating and visualizing primary data from prospective and legacy taxonomic literature Miller, Jeremy A. Agosti, Donat Penev, Lyubomir Sautter, Guido Georgiev, Teodor Catapano, Terry Patterson, David King, David Pereira, Serrano Vos, Rutger Aldo Sierra, Soraya Biodivers Data J General Research Article Abstract. Specimen data in taxonomic literature are among the highest quality primary biodiversity data. Innovative cybertaxonomic journals are using workflows that maintain data structure and disseminate electronic content to aggregators and other users; such structure is lost in traditional taxonomic publishing. Legacy taxonomic literature is a vast repository of knowledge about biodiversity. Currently, access to that resource is cumbersome, especially for non-specialist data consumers. Markup is a mechanism that makes this content more accessible, and is especially suited to machine analysis. Fine-grained XML (Extensible Markup Language) markup was applied to all (37) open-access articles published in the journal Zootaxa containing treatments on spiders (Order: Araneae). The markup approach was optimized to extract primary specimen data from legacy publications. These data were combined with data from articles containing treatments on spiders published in Biodiversity Data Journal where XML structure is part of the routine publication process. A series of charts was developed to visualize the content of specimen data in XML-tagged taxonomic treatments, either singly or in aggregate. The data can be filtered by several fields (including journal, taxon, institutional collection, collecting country, collector, author, article and treatment) to query particular aspects of the data. We demonstrate here that XML markup using GoldenGATE can address the challenge presented by unstructured legacy data, can extract structured primary biodiversity data which can be aggregated with and jointly queried with data from other Darwin Core-compatible sources, and show how visualization of these data can communicate key information contained in biodiversity literature. We complement recent studies on aspects of biodiversity knowledge using XML structured data to explore 1) the time lag between species discovry and description, and 2) the prevelence of rarity in species descriptions. Pensoft Publishers 2015-05-12 /pmc/articles/PMC4442254/ /pubmed/26023286 http://dx.doi.org/10.3897/BDJ.3.e5063 Text en Jeremy A. Miller, Donat Agosti, Lyubomir Penev, Guido Sautter, Teodor Georgiev, Terry Catapano, David Patterson, David King, Serrano Pereira, Rutger Aldo Vos, Soraya Sierra http://creativecommons.org/licenses/by/4.0 This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	General Research Article Miller, Jeremy A. Agosti, Donat Penev, Lyubomir Sautter, Guido Georgiev, Teodor Catapano, Terry Patterson, David King, David Pereira, Serrano Vos, Rutger Aldo Sierra, Soraya Integrating and visualizing primary data from prospective and legacy taxonomic literature
title	Integrating and visualizing primary data from prospective and legacy taxonomic literature
title_full	Integrating and visualizing primary data from prospective and legacy taxonomic literature
title_fullStr	Integrating and visualizing primary data from prospective and legacy taxonomic literature
title_full_unstemmed	Integrating and visualizing primary data from prospective and legacy taxonomic literature
title_short	Integrating and visualizing primary data from prospective and legacy taxonomic literature
title_sort	integrating and visualizing primary data from prospective and legacy taxonomic literature
topic	General Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4442254/ https://www.ncbi.nlm.nih.gov/pubmed/26023286 http://dx.doi.org/10.3897/BDJ.3.e5063
work_keys_str_mv	AT millerjeremya integratingandvisualizingprimarydatafromprospectiveandlegacytaxonomicliterature AT agostidonat integratingandvisualizingprimarydatafromprospectiveandlegacytaxonomicliterature AT penevlyubomir integratingandvisualizingprimarydatafromprospectiveandlegacytaxonomicliterature AT sautterguido integratingandvisualizingprimarydatafromprospectiveandlegacytaxonomicliterature AT georgievteodor integratingandvisualizingprimarydatafromprospectiveandlegacytaxonomicliterature AT catapanoterry integratingandvisualizingprimarydatafromprospectiveandlegacytaxonomicliterature AT pattersondavid integratingandvisualizingprimarydatafromprospectiveandlegacytaxonomicliterature AT kingdavid integratingandvisualizingprimarydatafromprospectiveandlegacytaxonomicliterature AT pereiraserrano integratingandvisualizingprimarydatafromprospectiveandlegacytaxonomicliterature AT vosrutgeraldo integratingandvisualizingprimarydatafromprospectiveandlegacytaxonomicliterature AT sierrasoraya integratingandvisualizingprimarydatafromprospectiveandlegacytaxonomicliterature

Integrating and visualizing primary data from prospective and legacy taxonomic literature

Ejemplares similares