Cargando…

The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels

Abstract. At the Royal Botanic Garden Edinburgh (RBGE) the use of Optical Character Recognition (OCR) to aid the digitisation process has been investigated. This was tested using a herbarium specimen digitisation process with two stages of data entry. Records were initially batch-processed to add da...

Descripción completa

Detalles Bibliográficos
Autores principales: Drinkwater, Robyn E., Cubey, Robert W. N., Haston, Elspeth M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Pensoft Publishers 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4086207/
https://www.ncbi.nlm.nih.gov/pubmed/25009435
http://dx.doi.org/10.3897/phytokeys.38.7168
_version_ 1782324783024701440
author Drinkwater, Robyn E.
Cubey, Robert W. N.
Haston, Elspeth M.
author_facet Drinkwater, Robyn E.
Cubey, Robert W. N.
Haston, Elspeth M.
author_sort Drinkwater, Robyn E.
collection PubMed
description Abstract. At the Royal Botanic Garden Edinburgh (RBGE) the use of Optical Character Recognition (OCR) to aid the digitisation process has been investigated. This was tested using a herbarium specimen digitisation process with two stages of data entry. Records were initially batch-processed to add data extracted from the OCR text prior to being sorted based on Collector and/or Country. Using images of the specimens, a team of six digitisers then added data to the specimen records. To investigate whether the data from OCR aid the digitisation process, they completed a series of trials which compared the efficiency of data entry between sorted and unsorted batches of specimens. A survey was carried out to explore the opinion of the digitisation staff to the different sorting options. In total 7,200 specimens were processed. When compared to an unsorted, random set of specimens, those which were sorted based on data added from the OCR were quicker to digitise. Of the methods tested here, the most successful in terms of efficiency used a protocol which required entering data into a limited set of fields and where the records were filtered by Collector and Country. The survey and subsequent discussions with the digitisation staff highlighted their preference for working with sorted specimens, in which label layout, locations and handwriting are likely to be similar, and so a familiarity with the Collector or Country is rapidly established.
format Online
Article
Text
id pubmed-4086207
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Pensoft Publishers
record_format MEDLINE/PubMed
spelling pubmed-40862072014-07-09 The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels Drinkwater, Robyn E. Cubey, Robert W. N. Haston, Elspeth M. PhytoKeys Research Article Abstract. At the Royal Botanic Garden Edinburgh (RBGE) the use of Optical Character Recognition (OCR) to aid the digitisation process has been investigated. This was tested using a herbarium specimen digitisation process with two stages of data entry. Records were initially batch-processed to add data extracted from the OCR text prior to being sorted based on Collector and/or Country. Using images of the specimens, a team of six digitisers then added data to the specimen records. To investigate whether the data from OCR aid the digitisation process, they completed a series of trials which compared the efficiency of data entry between sorted and unsorted batches of specimens. A survey was carried out to explore the opinion of the digitisation staff to the different sorting options. In total 7,200 specimens were processed. When compared to an unsorted, random set of specimens, those which were sorted based on data added from the OCR were quicker to digitise. Of the methods tested here, the most successful in terms of efficiency used a protocol which required entering data into a limited set of fields and where the records were filtered by Collector and Country. The survey and subsequent discussions with the digitisation staff highlighted their preference for working with sorted specimens, in which label layout, locations and handwriting are likely to be similar, and so a familiarity with the Collector or Country is rapidly established. Pensoft Publishers 2014-05-19 /pmc/articles/PMC4086207/ /pubmed/25009435 http://dx.doi.org/10.3897/phytokeys.38.7168 Text en Robyn E. Drinkwater, Robert W. N. Cubey, Elspeth M. Haston http://creativecommons.org/licenses/by/4.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Drinkwater, Robyn E.
Cubey, Robert W. N.
Haston, Elspeth M.
The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels
title The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels
title_full The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels
title_fullStr The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels
title_full_unstemmed The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels
title_short The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels
title_sort use of optical character recognition (ocr) in the digitisation of herbarium specimen labels
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4086207/
https://www.ncbi.nlm.nih.gov/pubmed/25009435
http://dx.doi.org/10.3897/phytokeys.38.7168
work_keys_str_mv AT drinkwaterrobyne theuseofopticalcharacterrecognitionocrinthedigitisationofherbariumspecimenlabels
AT cubeyrobertwn theuseofopticalcharacterrecognitionocrinthedigitisationofherbariumspecimenlabels
AT hastonelspethm theuseofopticalcharacterrecognitionocrinthedigitisationofherbariumspecimenlabels
AT drinkwaterrobyne useofopticalcharacterrecognitionocrinthedigitisationofherbariumspecimenlabels
AT cubeyrobertwn useofopticalcharacterrecognitionocrinthedigitisationofherbariumspecimenlabels
AT hastonelspethm useofopticalcharacterrecognitionocrinthedigitisationofherbariumspecimenlabels