Cargando…
Toward a service-based workflow for automated information extraction from herbarium specimens
Over the past years, herbarium collections worldwide have started to digitize millions of specimens on an industrial scale. Although the imaging costs are steadily falling, capturing the accompanying label information is still predominantly done manually and develops into the principal cost factor....
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6174549/ https://www.ncbi.nlm.nih.gov/pubmed/30295725 http://dx.doi.org/10.1093/database/bay103 |
_version_ | 1783361294490927104 |
---|---|
author | Kirchhoff, Agnes Bügel, Ulrich Santamaria, Eduard Reimeier, Fabian Röpert, Dominik Tebbje, Alexander Güntsch, Anton Chaves, Fernando Steinke, Karl-Heinz Berendsohn, Walter |
author_facet | Kirchhoff, Agnes Bügel, Ulrich Santamaria, Eduard Reimeier, Fabian Röpert, Dominik Tebbje, Alexander Güntsch, Anton Chaves, Fernando Steinke, Karl-Heinz Berendsohn, Walter |
author_sort | Kirchhoff, Agnes |
collection | PubMed |
description | Over the past years, herbarium collections worldwide have started to digitize millions of specimens on an industrial scale. Although the imaging costs are steadily falling, capturing the accompanying label information is still predominantly done manually and develops into the principal cost factor. In order to streamline the process of capturing herbarium specimen metadata, we specified a formal extensible workflow integrating a wide range of automated specimen image analysis services. We implemented the workflow on the basis of OpenRefine together with a plugin for handling service calls and responses. The evolving system presently covers the generation of optical character recognition (OCR) from specimen images, the identification of regions of interest in images and the extraction of meaningful information items from OCR. These implementations were developed as part of the Deutsche Forschungsgemeinschaft-funded a standardised and optimised process for data acquisition from digital images of herbarium specimens (StanDAP-Herb) Project. |
format | Online Article Text |
id | pubmed-6174549 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-61745492018-10-11 Toward a service-based workflow for automated information extraction from herbarium specimens Kirchhoff, Agnes Bügel, Ulrich Santamaria, Eduard Reimeier, Fabian Röpert, Dominik Tebbje, Alexander Güntsch, Anton Chaves, Fernando Steinke, Karl-Heinz Berendsohn, Walter Database (Oxford) Original Article Over the past years, herbarium collections worldwide have started to digitize millions of specimens on an industrial scale. Although the imaging costs are steadily falling, capturing the accompanying label information is still predominantly done manually and develops into the principal cost factor. In order to streamline the process of capturing herbarium specimen metadata, we specified a formal extensible workflow integrating a wide range of automated specimen image analysis services. We implemented the workflow on the basis of OpenRefine together with a plugin for handling service calls and responses. The evolving system presently covers the generation of optical character recognition (OCR) from specimen images, the identification of regions of interest in images and the extraction of meaningful information items from OCR. These implementations were developed as part of the Deutsche Forschungsgemeinschaft-funded a standardised and optimised process for data acquisition from digital images of herbarium specimens (StanDAP-Herb) Project. Oxford University Press 2018-10-08 /pmc/articles/PMC6174549/ /pubmed/30295725 http://dx.doi.org/10.1093/database/bay103 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Article Kirchhoff, Agnes Bügel, Ulrich Santamaria, Eduard Reimeier, Fabian Röpert, Dominik Tebbje, Alexander Güntsch, Anton Chaves, Fernando Steinke, Karl-Heinz Berendsohn, Walter Toward a service-based workflow for automated information extraction from herbarium specimens |
title | Toward a service-based workflow for automated information extraction from herbarium specimens |
title_full | Toward a service-based workflow for automated information extraction from herbarium specimens |
title_fullStr | Toward a service-based workflow for automated information extraction from herbarium specimens |
title_full_unstemmed | Toward a service-based workflow for automated information extraction from herbarium specimens |
title_short | Toward a service-based workflow for automated information extraction from herbarium specimens |
title_sort | toward a service-based workflow for automated information extraction from herbarium specimens |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6174549/ https://www.ncbi.nlm.nih.gov/pubmed/30295725 http://dx.doi.org/10.1093/database/bay103 |
work_keys_str_mv | AT kirchhoffagnes towardaservicebasedworkflowforautomatedinformationextractionfromherbariumspecimens AT bugelulrich towardaservicebasedworkflowforautomatedinformationextractionfromherbariumspecimens AT santamariaeduard towardaservicebasedworkflowforautomatedinformationextractionfromherbariumspecimens AT reimeierfabian towardaservicebasedworkflowforautomatedinformationextractionfromherbariumspecimens AT ropertdominik towardaservicebasedworkflowforautomatedinformationextractionfromherbariumspecimens AT tebbjealexander towardaservicebasedworkflowforautomatedinformationextractionfromherbariumspecimens AT guntschanton towardaservicebasedworkflowforautomatedinformationextractionfromherbariumspecimens AT chavesfernando towardaservicebasedworkflowforautomatedinformationextractionfromherbariumspecimens AT steinkekarlheinz towardaservicebasedworkflowforautomatedinformationextractionfromherbariumspecimens AT berendsohnwalter towardaservicebasedworkflowforautomatedinformationextractionfromherbariumspecimens |