Cargando…
One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical Document
Digital libraries offer access to a large number of handwritten historical documents. These documents are available as raw images and therefore their content is not searchable. A fully manual transcription is time-consuming and expensive while a fully automatic transcription is cheaper but not compa...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8321172/ https://www.ncbi.nlm.nih.gov/pubmed/34460550 http://dx.doi.org/10.3390/jimaging6100109 |
_version_ | 1783730787815784448 |
---|---|
author | Parziale, Antonio Capriolo, Giuliana Marcelli, Angelo |
author_facet | Parziale, Antonio Capriolo, Giuliana Marcelli, Angelo |
author_sort | Parziale, Antonio |
collection | PubMed |
description | Digital libraries offer access to a large number of handwritten historical documents. These documents are available as raw images and therefore their content is not searchable. A fully manual transcription is time-consuming and expensive while a fully automatic transcription is cheaper but not comparable in terms of accuracy. The performance of automatic transcription systems is strictly related to the composition of the training set. We propose a multi-step procedure that exploits a Keyword Spotting system and human validation for building up a training set in a time shorter than the one required by a fully manual procedure. The multi-step procedure was tested on a data set made up of 50 pages extracted from the Bentham collection. The palaeographer that transcribed the data set with the multi-step procedure instead of the fully manual procedure had a time gain of 52.54%. Moreover, a small size training set that allowed the keyword spotting system to show a precision value greater than the recall value was built with the multi-step procedure in a time equal to 35.25% of the time required for annotating the whole data set. |
format | Online Article Text |
id | pubmed-8321172 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-83211722021-08-26 One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical Document Parziale, Antonio Capriolo, Giuliana Marcelli, Angelo J Imaging Article Digital libraries offer access to a large number of handwritten historical documents. These documents are available as raw images and therefore their content is not searchable. A fully manual transcription is time-consuming and expensive while a fully automatic transcription is cheaper but not comparable in terms of accuracy. The performance of automatic transcription systems is strictly related to the composition of the training set. We propose a multi-step procedure that exploits a Keyword Spotting system and human validation for building up a training set in a time shorter than the one required by a fully manual procedure. The multi-step procedure was tested on a data set made up of 50 pages extracted from the Bentham collection. The palaeographer that transcribed the data set with the multi-step procedure instead of the fully manual procedure had a time gain of 52.54%. Moreover, a small size training set that allowed the keyword spotting system to show a precision value greater than the recall value was built with the multi-step procedure in a time equal to 35.25% of the time required for annotating the whole data set. MDPI 2020-10-13 /pmc/articles/PMC8321172/ /pubmed/34460550 http://dx.doi.org/10.3390/jimaging6100109 Text en © 2020 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ). |
spellingShingle | Article Parziale, Antonio Capriolo, Giuliana Marcelli, Angelo One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical Document |
title | One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical Document |
title_full | One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical Document |
title_fullStr | One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical Document |
title_full_unstemmed | One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical Document |
title_short | One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical Document |
title_sort | one step is not enough: a multi-step procedure for building the training set of a query by string keyword spotting system to assist the transcription of historical document |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8321172/ https://www.ncbi.nlm.nih.gov/pubmed/34460550 http://dx.doi.org/10.3390/jimaging6100109 |
work_keys_str_mv | AT parzialeantonio onestepisnotenoughamultistepprocedureforbuildingthetrainingsetofaquerybystringkeywordspottingsystemtoassistthetranscriptionofhistoricaldocument AT capriologiuliana onestepisnotenoughamultistepprocedureforbuildingthetrainingsetofaquerybystringkeywordspottingsystemtoassistthetranscriptionofhistoricaldocument AT marcelliangelo onestepisnotenoughamultistepprocedureforbuildingthetrainingsetofaquerybystringkeywordspottingsystemtoassistthetranscriptionofhistoricaldocument |