Cargando…

End-to-End Transcript Alignment of 17th Century Manuscripts: The Case of Moccia Code

The growth of digital libraries has yielded a large number of handwritten historical documents in the form of images, often accompanied by a digital transcription of the content. The ability to track the position of the words of the digital transcription in the images can be important both for the s...

Descripción completa

Detalles Bibliográficos
Autores principales: De Gregorio, Giuseppe, Capriolo, Giuliana, Marcelli, Angelo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9864051/
https://www.ncbi.nlm.nih.gov/pubmed/36662115
http://dx.doi.org/10.3390/jimaging9010017
_version_ 1784875487269486592
author De Gregorio, Giuseppe
Capriolo, Giuliana
Marcelli, Angelo
author_facet De Gregorio, Giuseppe
Capriolo, Giuliana
Marcelli, Angelo
author_sort De Gregorio, Giuseppe
collection PubMed
description The growth of digital libraries has yielded a large number of handwritten historical documents in the form of images, often accompanied by a digital transcription of the content. The ability to track the position of the words of the digital transcription in the images can be important both for the study of the document by humanities scholars and for further automatic processing. We propose a learning-free method for automatically aligning the transcription to the document image. The method receives as input the digital image of the document and the transcription of its content and aims at linking the transcription to the corresponding images within the page at the word level. The method comprises two main original contributions: a line-level segmentation algorithm capable of detecting text lines with curved baseline, and a text-to-image alignment algorithm capable of dealing with under- and over-segmentation errors at the word level. Experiments on pages from a 17th-century Italian manuscript have demonstrated that the line segmentation method allows one to segment 92% of the text line correctly. They also demonstrated that it achieves a correct alignment accuracy greater than 68%. Moreover, the performance achieved on widely used data sets compare favourably with the state of the art.
format Online
Article
Text
id pubmed-9864051
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-98640512023-01-22 End-to-End Transcript Alignment of 17th Century Manuscripts: The Case of Moccia Code De Gregorio, Giuseppe Capriolo, Giuliana Marcelli, Angelo J Imaging Article The growth of digital libraries has yielded a large number of handwritten historical documents in the form of images, often accompanied by a digital transcription of the content. The ability to track the position of the words of the digital transcription in the images can be important both for the study of the document by humanities scholars and for further automatic processing. We propose a learning-free method for automatically aligning the transcription to the document image. The method receives as input the digital image of the document and the transcription of its content and aims at linking the transcription to the corresponding images within the page at the word level. The method comprises two main original contributions: a line-level segmentation algorithm capable of detecting text lines with curved baseline, and a text-to-image alignment algorithm capable of dealing with under- and over-segmentation errors at the word level. Experiments on pages from a 17th-century Italian manuscript have demonstrated that the line segmentation method allows one to segment 92% of the text line correctly. They also demonstrated that it achieves a correct alignment accuracy greater than 68%. Moreover, the performance achieved on widely used data sets compare favourably with the state of the art. MDPI 2023-01-13 /pmc/articles/PMC9864051/ /pubmed/36662115 http://dx.doi.org/10.3390/jimaging9010017 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
De Gregorio, Giuseppe
Capriolo, Giuliana
Marcelli, Angelo
End-to-End Transcript Alignment of 17th Century Manuscripts: The Case of Moccia Code
title End-to-End Transcript Alignment of 17th Century Manuscripts: The Case of Moccia Code
title_full End-to-End Transcript Alignment of 17th Century Manuscripts: The Case of Moccia Code
title_fullStr End-to-End Transcript Alignment of 17th Century Manuscripts: The Case of Moccia Code
title_full_unstemmed End-to-End Transcript Alignment of 17th Century Manuscripts: The Case of Moccia Code
title_short End-to-End Transcript Alignment of 17th Century Manuscripts: The Case of Moccia Code
title_sort end-to-end transcript alignment of 17th century manuscripts: the case of moccia code
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9864051/
https://www.ncbi.nlm.nih.gov/pubmed/36662115
http://dx.doi.org/10.3390/jimaging9010017
work_keys_str_mv AT degregoriogiuseppe endtoendtranscriptalignmentof17thcenturymanuscriptsthecaseofmocciacode
AT capriologiuliana endtoendtranscriptalignmentof17thcenturymanuscriptsthecaseofmocciacode
AT marcelliangelo endtoendtranscriptalignmentof17thcenturymanuscriptsthecaseofmocciacode