Cargando…

Novel Perspectives for the Management of Multilingual and Multialphabetic Heritages through Automatic Knowledge Extraction: The DigitalMaktaba Approach †

The linguistic and social impact of multiculturalism can no longer be neglected in any sector, creating the urgent need of creating systems and procedures for managing and sharing cultural heritages in both supranational and multi-literate contexts. In order to achieve this goal, text sensing appear...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bergamaschi, Sonia, De Nardis, Stefania, Martoglia, Riccardo, Ruozzi, Federico, Sala, Luca, Vanzini, Matteo, Vigliermo, Riccardo Amerigo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9182969/ https://www.ncbi.nlm.nih.gov/pubmed/35684615 http://dx.doi.org/10.3390/s22113995

_version_	1784724171485347840
author	Bergamaschi, Sonia De Nardis, Stefania Martoglia, Riccardo Ruozzi, Federico Sala, Luca Vanzini, Matteo Vigliermo, Riccardo Amerigo
author_facet	Bergamaschi, Sonia De Nardis, Stefania Martoglia, Riccardo Ruozzi, Federico Sala, Luca Vanzini, Matteo Vigliermo, Riccardo Amerigo
author_sort	Bergamaschi, Sonia
collection	PubMed
description	The linguistic and social impact of multiculturalism can no longer be neglected in any sector, creating the urgent need of creating systems and procedures for managing and sharing cultural heritages in both supranational and multi-literate contexts. In order to achieve this goal, text sensing appears to be one of the most crucial research areas. The long-term objective of the DigitalMaktaba project, born from interdisciplinary collaboration between computer scientists, historians, librarians, engineers and linguists, is to establish procedures for the creation, management and cataloguing of archival heritage in non-Latin alphabets. In this paper, we discuss the currently ongoing design of an innovative workflow and tool in the area of text sensing, for the automatic extraction of knowledge and cataloguing of documents written in non-Latin languages (Arabic, Persian and Azerbaijani). The current prototype leverages different OCR, text processing and information extraction techniques in order to provide both a highly accurate extracted text and rich metadata content (including automatically identified cataloguing metadata), overcoming typical limitations of current state of the art approaches. The initial tests provide promising results. The paper includes a discussion of future steps (e.g., AI-based techniques further leveraging the extracted data/metadata and making the system learn from user feedback) and of the many foreseen advantages of this research, both from a technical and a broader cultural-preservation and sharing point of view.
format	Online Article Text
id	pubmed-9182969
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-91829692022-06-10 Novel Perspectives for the Management of Multilingual and Multialphabetic Heritages through Automatic Knowledge Extraction: The DigitalMaktaba Approach † Bergamaschi, Sonia De Nardis, Stefania Martoglia, Riccardo Ruozzi, Federico Sala, Luca Vanzini, Matteo Vigliermo, Riccardo Amerigo Sensors (Basel) Article The linguistic and social impact of multiculturalism can no longer be neglected in any sector, creating the urgent need of creating systems and procedures for managing and sharing cultural heritages in both supranational and multi-literate contexts. In order to achieve this goal, text sensing appears to be one of the most crucial research areas. The long-term objective of the DigitalMaktaba project, born from interdisciplinary collaboration between computer scientists, historians, librarians, engineers and linguists, is to establish procedures for the creation, management and cataloguing of archival heritage in non-Latin alphabets. In this paper, we discuss the currently ongoing design of an innovative workflow and tool in the area of text sensing, for the automatic extraction of knowledge and cataloguing of documents written in non-Latin languages (Arabic, Persian and Azerbaijani). The current prototype leverages different OCR, text processing and information extraction techniques in order to provide both a highly accurate extracted text and rich metadata content (including automatically identified cataloguing metadata), overcoming typical limitations of current state of the art approaches. The initial tests provide promising results. The paper includes a discussion of future steps (e.g., AI-based techniques further leveraging the extracted data/metadata and making the system learn from user feedback) and of the many foreseen advantages of this research, both from a technical and a broader cultural-preservation and sharing point of view. MDPI 2022-05-25 /pmc/articles/PMC9182969/ /pubmed/35684615 http://dx.doi.org/10.3390/s22113995 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Bergamaschi, Sonia De Nardis, Stefania Martoglia, Riccardo Ruozzi, Federico Sala, Luca Vanzini, Matteo Vigliermo, Riccardo Amerigo Novel Perspectives for the Management of Multilingual and Multialphabetic Heritages through Automatic Knowledge Extraction: The DigitalMaktaba Approach †
title	Novel Perspectives for the Management of Multilingual and Multialphabetic Heritages through Automatic Knowledge Extraction: The DigitalMaktaba Approach †
title_full	Novel Perspectives for the Management of Multilingual and Multialphabetic Heritages through Automatic Knowledge Extraction: The DigitalMaktaba Approach †
title_fullStr	Novel Perspectives for the Management of Multilingual and Multialphabetic Heritages through Automatic Knowledge Extraction: The DigitalMaktaba Approach †
title_full_unstemmed	Novel Perspectives for the Management of Multilingual and Multialphabetic Heritages through Automatic Knowledge Extraction: The DigitalMaktaba Approach †
title_short	Novel Perspectives for the Management of Multilingual and Multialphabetic Heritages through Automatic Knowledge Extraction: The DigitalMaktaba Approach †
title_sort	novel perspectives for the management of multilingual and multialphabetic heritages through automatic knowledge extraction: the digitalmaktaba approach †
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9182969/ https://www.ncbi.nlm.nih.gov/pubmed/35684615 http://dx.doi.org/10.3390/s22113995
work_keys_str_mv	AT bergamaschisonia novelperspectivesforthemanagementofmultilingualandmultialphabeticheritagesthroughautomaticknowledgeextractionthedigitalmaktabaapproach AT denardisstefania novelperspectivesforthemanagementofmultilingualandmultialphabeticheritagesthroughautomaticknowledgeextractionthedigitalmaktabaapproach AT martogliariccardo novelperspectivesforthemanagementofmultilingualandmultialphabeticheritagesthroughautomaticknowledgeextractionthedigitalmaktabaapproach AT ruozzifederico novelperspectivesforthemanagementofmultilingualandmultialphabeticheritagesthroughautomaticknowledgeextractionthedigitalmaktabaapproach AT salaluca novelperspectivesforthemanagementofmultilingualandmultialphabeticheritagesthroughautomaticknowledgeextractionthedigitalmaktabaapproach AT vanzinimatteo novelperspectivesforthemanagementofmultilingualandmultialphabeticheritagesthroughautomaticknowledgeextractionthedigitalmaktabaapproach AT vigliermoriccardoamerigo novelperspectivesforthemanagementofmultilingualandmultialphabeticheritagesthroughautomaticknowledgeextractionthedigitalmaktabaapproach

Novel Perspectives for the Management of Multilingual and Multialphabetic Heritages through Automatic Knowledge Extraction: The DigitalMaktaba Approach †

Ejemplares similares