Cargando…

Medieval Spanish (12th–15th centuries) named entity recognition and attribute annotation system based on contextual information

The recognition of named entities in Spanish medieval texts presents great complexity, involving specific challenges: First, the complex morphosyntactic characteristics in proper‐noun use in medieval texts. Second, the lack of strict orthographic standards. Finally, diachronic and geographical varia...

Descripción completa

Detalles Bibliográficos
Autores principales: Díez Platas, Mª Luisa, Ros Muñoz, Salvador, González‐Blanco, Elena, Ruiz Fabo, Pablo, Álvarez Mellado, Elena
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley & Sons, Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7891424/
https://www.ncbi.nlm.nih.gov/pubmed/33665231
http://dx.doi.org/10.1002/asi.24399
_version_ 1783652696519081984
author Díez Platas, Mª Luisa
Ros Muñoz, Salvador
González‐Blanco, Elena
Ruiz Fabo, Pablo
Álvarez Mellado, Elena
author_facet Díez Platas, Mª Luisa
Ros Muñoz, Salvador
González‐Blanco, Elena
Ruiz Fabo, Pablo
Álvarez Mellado, Elena
author_sort Díez Platas, Mª Luisa
collection PubMed
description The recognition of named entities in Spanish medieval texts presents great complexity, involving specific challenges: First, the complex morphosyntactic characteristics in proper‐noun use in medieval texts. Second, the lack of strict orthographic standards. Finally, diachronic and geographical variations in Spanish from the 12th to 15th century. In this period, named entities usually appear as complex text structure. For example, it was frequent to add nicknames and information about the persons role in society and geographic origin. To tackle this complexity, named entity recognition and classification system has been implemented. The system uses contextual cues based on semantics to detect entities and assign a type. Given the occurrence of entities with attached attributes, entity contexts are also parsed to determine entity‐type‐specific dependencies for these attributes. Moreover, it uses a variant generator to handle the diachronic evolution of Spanish medieval terms from a phonetic and morphosyntactic viewpoint. The tool iteratively enriches its proper lexica, dictionaries, and gazetteers. The system was evaluated on a corpus of over 3,000 manually annotated entities of different types and periods, obtaining F1 scores between 0.74 and 0.87. Attribute annotation was evaluated for a person and role name attributes with an overall F1 of 0.75.
format Online
Article
Text
id pubmed-7891424
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher John Wiley & Sons, Inc.
record_format MEDLINE/PubMed
spelling pubmed-78914242021-03-02 Medieval Spanish (12th–15th centuries) named entity recognition and attribute annotation system based on contextual information Díez Platas, Mª Luisa Ros Muñoz, Salvador González‐Blanco, Elena Ruiz Fabo, Pablo Álvarez Mellado, Elena J Assoc Inf Sci Technol Research Articles The recognition of named entities in Spanish medieval texts presents great complexity, involving specific challenges: First, the complex morphosyntactic characteristics in proper‐noun use in medieval texts. Second, the lack of strict orthographic standards. Finally, diachronic and geographical variations in Spanish from the 12th to 15th century. In this period, named entities usually appear as complex text structure. For example, it was frequent to add nicknames and information about the persons role in society and geographic origin. To tackle this complexity, named entity recognition and classification system has been implemented. The system uses contextual cues based on semantics to detect entities and assign a type. Given the occurrence of entities with attached attributes, entity contexts are also parsed to determine entity‐type‐specific dependencies for these attributes. Moreover, it uses a variant generator to handle the diachronic evolution of Spanish medieval terms from a phonetic and morphosyntactic viewpoint. The tool iteratively enriches its proper lexica, dictionaries, and gazetteers. The system was evaluated on a corpus of over 3,000 manually annotated entities of different types and periods, obtaining F1 scores between 0.74 and 0.87. Attribute annotation was evaluated for a person and role name attributes with an overall F1 of 0.75. John Wiley & Sons, Inc. 2020-08-19 2021-02 /pmc/articles/PMC7891424/ /pubmed/33665231 http://dx.doi.org/10.1002/asi.24399 Text en © 2020 The Authors. Journal of the Association for Information Science and Technology published by Wiley Periodicals LLC on behalf of Association for Information Science and Technology. This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Articles
Díez Platas, Mª Luisa
Ros Muñoz, Salvador
González‐Blanco, Elena
Ruiz Fabo, Pablo
Álvarez Mellado, Elena
Medieval Spanish (12th–15th centuries) named entity recognition and attribute annotation system based on contextual information
title Medieval Spanish (12th–15th centuries) named entity recognition and attribute annotation system based on contextual information
title_full Medieval Spanish (12th–15th centuries) named entity recognition and attribute annotation system based on contextual information
title_fullStr Medieval Spanish (12th–15th centuries) named entity recognition and attribute annotation system based on contextual information
title_full_unstemmed Medieval Spanish (12th–15th centuries) named entity recognition and attribute annotation system based on contextual information
title_short Medieval Spanish (12th–15th centuries) named entity recognition and attribute annotation system based on contextual information
title_sort medieval spanish (12th–15th centuries) named entity recognition and attribute annotation system based on contextual information
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7891424/
https://www.ncbi.nlm.nih.gov/pubmed/33665231
http://dx.doi.org/10.1002/asi.24399
work_keys_str_mv AT diezplatasmaluisa medievalspanish12th15thcenturiesnamedentityrecognitionandattributeannotationsystembasedoncontextualinformation
AT rosmunozsalvador medievalspanish12th15thcenturiesnamedentityrecognitionandattributeannotationsystembasedoncontextualinformation
AT gonzalezblancoelena medievalspanish12th15thcenturiesnamedentityrecognitionandattributeannotationsystembasedoncontextualinformation
AT ruizfabopablo medievalspanish12th15thcenturiesnamedentityrecognitionandattributeannotationsystembasedoncontextualinformation
AT alvarezmelladoelena medievalspanish12th15thcenturiesnamedentityrecognitionandattributeannotationsystembasedoncontextualinformation