Cargando…

CorDeep and the Sacrobosco Dataset: Detection of Visual Elements in Historical Documents

Recent advances in object detection facilitated by deep learning have led to numerous solutions in a myriad of fields ranging from medical diagnosis to autonomous driving. However, historical research is yet to reap the benefits of such advances. This is generally due to the low number of large, coh...

Descripción completa

Detalles Bibliográficos
Autores principales: Büttner, Jochen, Martinetz, Julius, El-Hajj, Hassan, Valleriani, Matteo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9605005/
https://www.ncbi.nlm.nih.gov/pubmed/36286379
http://dx.doi.org/10.3390/jimaging8100285
_version_ 1784817957949407232
author Büttner, Jochen
Martinetz, Julius
El-Hajj, Hassan
Valleriani, Matteo
author_facet Büttner, Jochen
Martinetz, Julius
El-Hajj, Hassan
Valleriani, Matteo
author_sort Büttner, Jochen
collection PubMed
description Recent advances in object detection facilitated by deep learning have led to numerous solutions in a myriad of fields ranging from medical diagnosis to autonomous driving. However, historical research is yet to reap the benefits of such advances. This is generally due to the low number of large, coherent, and annotated datasets of historical documents, as well as the overwhelming focus on Optical Character Recognition to support the analysis of historical documents. In this paper, we highlight the importance of visual elements, in particular illustrations in historical documents, and offer a public multi-class historical visual element dataset based on the Sphaera corpus. Additionally, we train an image extraction model based on YOLO architecture and publish it through a publicly available web-service to detect and extract multi-class images from historical documents in an effort to bridge the gap between traditional and computational approaches in historical studies.
format Online
Article
Text
id pubmed-9605005
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-96050052022-10-27 CorDeep and the Sacrobosco Dataset: Detection of Visual Elements in Historical Documents Büttner, Jochen Martinetz, Julius El-Hajj, Hassan Valleriani, Matteo J Imaging Article Recent advances in object detection facilitated by deep learning have led to numerous solutions in a myriad of fields ranging from medical diagnosis to autonomous driving. However, historical research is yet to reap the benefits of such advances. This is generally due to the low number of large, coherent, and annotated datasets of historical documents, as well as the overwhelming focus on Optical Character Recognition to support the analysis of historical documents. In this paper, we highlight the importance of visual elements, in particular illustrations in historical documents, and offer a public multi-class historical visual element dataset based on the Sphaera corpus. Additionally, we train an image extraction model based on YOLO architecture and publish it through a publicly available web-service to detect and extract multi-class images from historical documents in an effort to bridge the gap between traditional and computational approaches in historical studies. MDPI 2022-10-15 /pmc/articles/PMC9605005/ /pubmed/36286379 http://dx.doi.org/10.3390/jimaging8100285 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Büttner, Jochen
Martinetz, Julius
El-Hajj, Hassan
Valleriani, Matteo
CorDeep and the Sacrobosco Dataset: Detection of Visual Elements in Historical Documents
title CorDeep and the Sacrobosco Dataset: Detection of Visual Elements in Historical Documents
title_full CorDeep and the Sacrobosco Dataset: Detection of Visual Elements in Historical Documents
title_fullStr CorDeep and the Sacrobosco Dataset: Detection of Visual Elements in Historical Documents
title_full_unstemmed CorDeep and the Sacrobosco Dataset: Detection of Visual Elements in Historical Documents
title_short CorDeep and the Sacrobosco Dataset: Detection of Visual Elements in Historical Documents
title_sort cordeep and the sacrobosco dataset: detection of visual elements in historical documents
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9605005/
https://www.ncbi.nlm.nih.gov/pubmed/36286379
http://dx.doi.org/10.3390/jimaging8100285
work_keys_str_mv AT buttnerjochen cordeepandthesacroboscodatasetdetectionofvisualelementsinhistoricaldocuments
AT martinetzjulius cordeepandthesacroboscodatasetdetectionofvisualelementsinhistoricaldocuments
AT elhajjhassan cordeepandthesacroboscodatasetdetectionofvisualelementsinhistoricaldocuments
AT vallerianimatteo cordeepandthesacroboscodatasetdetectionofvisualelementsinhistoricaldocuments