Cargando…

Novel Perspectives for the Management of Multilingual and Multialphabetic Heritages through Automatic Knowledge Extraction: The DigitalMaktaba Approach †

The linguistic and social impact of multiculturalism can no longer be neglected in any sector, creating the urgent need of creating systems and procedures for managing and sharing cultural heritages in both supranational and multi-literate contexts. In order to achieve this goal, text sensing appear...

Descripción completa

Detalles Bibliográficos
Autores principales: Bergamaschi, Sonia, De Nardis, Stefania, Martoglia, Riccardo, Ruozzi, Federico, Sala, Luca, Vanzini, Matteo, Vigliermo, Riccardo Amerigo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9182969/
https://www.ncbi.nlm.nih.gov/pubmed/35684615
http://dx.doi.org/10.3390/s22113995
_version_ 1784724171485347840
author Bergamaschi, Sonia
De Nardis, Stefania
Martoglia, Riccardo
Ruozzi, Federico
Sala, Luca
Vanzini, Matteo
Vigliermo, Riccardo Amerigo
author_facet Bergamaschi, Sonia
De Nardis, Stefania
Martoglia, Riccardo
Ruozzi, Federico
Sala, Luca
Vanzini, Matteo
Vigliermo, Riccardo Amerigo
author_sort Bergamaschi, Sonia
collection PubMed
description The linguistic and social impact of multiculturalism can no longer be neglected in any sector, creating the urgent need of creating systems and procedures for managing and sharing cultural heritages in both supranational and multi-literate contexts. In order to achieve this goal, text sensing appears to be one of the most crucial research areas. The long-term objective of the DigitalMaktaba project, born from interdisciplinary collaboration between computer scientists, historians, librarians, engineers and linguists, is to establish procedures for the creation, management and cataloguing of archival heritage in non-Latin alphabets. In this paper, we discuss the currently ongoing design of an innovative workflow and tool in the area of text sensing, for the automatic extraction of knowledge and cataloguing of documents written in non-Latin languages (Arabic, Persian and Azerbaijani). The current prototype leverages different OCR, text processing and information extraction techniques in order to provide both a highly accurate extracted text and rich metadata content (including automatically identified cataloguing metadata), overcoming typical limitations of current state of the art approaches. The initial tests provide promising results. The paper includes a discussion of future steps (e.g., AI-based techniques further leveraging the extracted data/metadata and making the system learn from user feedback) and of the many foreseen advantages of this research, both from a technical and a broader cultural-preservation and sharing point of view.
format Online
Article
Text
id pubmed-9182969
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-91829692022-06-10 Novel Perspectives for the Management of Multilingual and Multialphabetic Heritages through Automatic Knowledge Extraction: The DigitalMaktaba Approach † Bergamaschi, Sonia De Nardis, Stefania Martoglia, Riccardo Ruozzi, Federico Sala, Luca Vanzini, Matteo Vigliermo, Riccardo Amerigo Sensors (Basel) Article The linguistic and social impact of multiculturalism can no longer be neglected in any sector, creating the urgent need of creating systems and procedures for managing and sharing cultural heritages in both supranational and multi-literate contexts. In order to achieve this goal, text sensing appears to be one of the most crucial research areas. The long-term objective of the DigitalMaktaba project, born from interdisciplinary collaboration between computer scientists, historians, librarians, engineers and linguists, is to establish procedures for the creation, management and cataloguing of archival heritage in non-Latin alphabets. In this paper, we discuss the currently ongoing design of an innovative workflow and tool in the area of text sensing, for the automatic extraction of knowledge and cataloguing of documents written in non-Latin languages (Arabic, Persian and Azerbaijani). The current prototype leverages different OCR, text processing and information extraction techniques in order to provide both a highly accurate extracted text and rich metadata content (including automatically identified cataloguing metadata), overcoming typical limitations of current state of the art approaches. The initial tests provide promising results. The paper includes a discussion of future steps (e.g., AI-based techniques further leveraging the extracted data/metadata and making the system learn from user feedback) and of the many foreseen advantages of this research, both from a technical and a broader cultural-preservation and sharing point of view. MDPI 2022-05-25 /pmc/articles/PMC9182969/ /pubmed/35684615 http://dx.doi.org/10.3390/s22113995 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Bergamaschi, Sonia
De Nardis, Stefania
Martoglia, Riccardo
Ruozzi, Federico
Sala, Luca
Vanzini, Matteo
Vigliermo, Riccardo Amerigo
Novel Perspectives for the Management of Multilingual and Multialphabetic Heritages through Automatic Knowledge Extraction: The DigitalMaktaba Approach †
title Novel Perspectives for the Management of Multilingual and Multialphabetic Heritages through Automatic Knowledge Extraction: The DigitalMaktaba Approach †
title_full Novel Perspectives for the Management of Multilingual and Multialphabetic Heritages through Automatic Knowledge Extraction: The DigitalMaktaba Approach †
title_fullStr Novel Perspectives for the Management of Multilingual and Multialphabetic Heritages through Automatic Knowledge Extraction: The DigitalMaktaba Approach †
title_full_unstemmed Novel Perspectives for the Management of Multilingual and Multialphabetic Heritages through Automatic Knowledge Extraction: The DigitalMaktaba Approach †
title_short Novel Perspectives for the Management of Multilingual and Multialphabetic Heritages through Automatic Knowledge Extraction: The DigitalMaktaba Approach †
title_sort novel perspectives for the management of multilingual and multialphabetic heritages through automatic knowledge extraction: the digitalmaktaba approach †
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9182969/
https://www.ncbi.nlm.nih.gov/pubmed/35684615
http://dx.doi.org/10.3390/s22113995
work_keys_str_mv AT bergamaschisonia novelperspectivesforthemanagementofmultilingualandmultialphabeticheritagesthroughautomaticknowledgeextractionthedigitalmaktabaapproach
AT denardisstefania novelperspectivesforthemanagementofmultilingualandmultialphabeticheritagesthroughautomaticknowledgeextractionthedigitalmaktabaapproach
AT martogliariccardo novelperspectivesforthemanagementofmultilingualandmultialphabeticheritagesthroughautomaticknowledgeextractionthedigitalmaktabaapproach
AT ruozzifederico novelperspectivesforthemanagementofmultilingualandmultialphabeticheritagesthroughautomaticknowledgeextractionthedigitalmaktabaapproach
AT salaluca novelperspectivesforthemanagementofmultilingualandmultialphabeticheritagesthroughautomaticknowledgeextractionthedigitalmaktabaapproach
AT vanzinimatteo novelperspectivesforthemanagementofmultilingualandmultialphabeticheritagesthroughautomaticknowledgeextractionthedigitalmaktabaapproach
AT vigliermoriccardoamerigo novelperspectivesforthemanagementofmultilingualandmultialphabeticheritagesthroughautomaticknowledgeextractionthedigitalmaktabaapproach