Cargando…

Toward a service-based workflow for automated information extraction from herbarium specimens

Over the past years, herbarium collections worldwide have started to digitize millions of specimens on an industrial scale. Although the imaging costs are steadily falling, capturing the accompanying label information is still predominantly done manually and develops into the principal cost factor....

Descripción completa

Detalles Bibliográficos
Autores principales: Kirchhoff, Agnes, Bügel, Ulrich, Santamaria, Eduard, Reimeier, Fabian, Röpert, Dominik, Tebbje, Alexander, Güntsch, Anton, Chaves, Fernando, Steinke, Karl-Heinz, Berendsohn, Walter
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6174549/
https://www.ncbi.nlm.nih.gov/pubmed/30295725
http://dx.doi.org/10.1093/database/bay103
_version_ 1783361294490927104
author Kirchhoff, Agnes
Bügel, Ulrich
Santamaria, Eduard
Reimeier, Fabian
Röpert, Dominik
Tebbje, Alexander
Güntsch, Anton
Chaves, Fernando
Steinke, Karl-Heinz
Berendsohn, Walter
author_facet Kirchhoff, Agnes
Bügel, Ulrich
Santamaria, Eduard
Reimeier, Fabian
Röpert, Dominik
Tebbje, Alexander
Güntsch, Anton
Chaves, Fernando
Steinke, Karl-Heinz
Berendsohn, Walter
author_sort Kirchhoff, Agnes
collection PubMed
description Over the past years, herbarium collections worldwide have started to digitize millions of specimens on an industrial scale. Although the imaging costs are steadily falling, capturing the accompanying label information is still predominantly done manually and develops into the principal cost factor. In order to streamline the process of capturing herbarium specimen metadata, we specified a formal extensible workflow integrating a wide range of automated specimen image analysis services. We implemented the workflow on the basis of OpenRefine together with a plugin for handling service calls and responses. The evolving system presently covers the generation of optical character recognition (OCR) from specimen images, the identification of regions of interest in images and the extraction of meaningful information items from OCR. These implementations were developed as part of the Deutsche Forschungsgemeinschaft-funded a standardised and optimised process for data acquisition from digital images of herbarium specimens (StanDAP-Herb) Project.
format Online
Article
Text
id pubmed-6174549
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-61745492018-10-11 Toward a service-based workflow for automated information extraction from herbarium specimens Kirchhoff, Agnes Bügel, Ulrich Santamaria, Eduard Reimeier, Fabian Röpert, Dominik Tebbje, Alexander Güntsch, Anton Chaves, Fernando Steinke, Karl-Heinz Berendsohn, Walter Database (Oxford) Original Article Over the past years, herbarium collections worldwide have started to digitize millions of specimens on an industrial scale. Although the imaging costs are steadily falling, capturing the accompanying label information is still predominantly done manually and develops into the principal cost factor. In order to streamline the process of capturing herbarium specimen metadata, we specified a formal extensible workflow integrating a wide range of automated specimen image analysis services. We implemented the workflow on the basis of OpenRefine together with a plugin for handling service calls and responses. The evolving system presently covers the generation of optical character recognition (OCR) from specimen images, the identification of regions of interest in images and the extraction of meaningful information items from OCR. These implementations were developed as part of the Deutsche Forschungsgemeinschaft-funded a standardised and optimised process for data acquisition from digital images of herbarium specimens (StanDAP-Herb) Project. Oxford University Press 2018-10-08 /pmc/articles/PMC6174549/ /pubmed/30295725 http://dx.doi.org/10.1093/database/bay103 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Kirchhoff, Agnes
Bügel, Ulrich
Santamaria, Eduard
Reimeier, Fabian
Röpert, Dominik
Tebbje, Alexander
Güntsch, Anton
Chaves, Fernando
Steinke, Karl-Heinz
Berendsohn, Walter
Toward a service-based workflow for automated information extraction from herbarium specimens
title Toward a service-based workflow for automated information extraction from herbarium specimens
title_full Toward a service-based workflow for automated information extraction from herbarium specimens
title_fullStr Toward a service-based workflow for automated information extraction from herbarium specimens
title_full_unstemmed Toward a service-based workflow for automated information extraction from herbarium specimens
title_short Toward a service-based workflow for automated information extraction from herbarium specimens
title_sort toward a service-based workflow for automated information extraction from herbarium specimens
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6174549/
https://www.ncbi.nlm.nih.gov/pubmed/30295725
http://dx.doi.org/10.1093/database/bay103
work_keys_str_mv AT kirchhoffagnes towardaservicebasedworkflowforautomatedinformationextractionfromherbariumspecimens
AT bugelulrich towardaservicebasedworkflowforautomatedinformationextractionfromherbariumspecimens
AT santamariaeduard towardaservicebasedworkflowforautomatedinformationextractionfromherbariumspecimens
AT reimeierfabian towardaservicebasedworkflowforautomatedinformationextractionfromherbariumspecimens
AT ropertdominik towardaservicebasedworkflowforautomatedinformationextractionfromherbariumspecimens
AT tebbjealexander towardaservicebasedworkflowforautomatedinformationextractionfromherbariumspecimens
AT guntschanton towardaservicebasedworkflowforautomatedinformationextractionfromherbariumspecimens
AT chavesfernando towardaservicebasedworkflowforautomatedinformationextractionfromherbariumspecimens
AT steinkekarlheinz towardaservicebasedworkflowforautomatedinformationextractionfromherbariumspecimens
AT berendsohnwalter towardaservicebasedworkflowforautomatedinformationextractionfromherbariumspecimens