Cargando…

Learning adaptive representations for entity recognition in the biomedical domain

BACKGROUND: Named Entity Recognition is a common task in Natural Language Processing applications, whose purpose is to recognize named entities in textual documents. Several systems exist to solve this task in the biomedical domain, based on Natural Language Processing techniques and Machine Learnin...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lauriola, Ivano, Aiolli, Fabio, Lavelli, Alberto, Rinaldi, Fabio
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2021
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8127187/ https://www.ncbi.nlm.nih.gov/pubmed/34001263 http://dx.doi.org/10.1186/s13326-021-00238-0

_version_	1783693902422736896
author	Lauriola, Ivano Aiolli, Fabio Lavelli, Alberto Rinaldi, Fabio
author_facet	Lauriola, Ivano Aiolli, Fabio Lavelli, Alberto Rinaldi, Fabio
author_sort	Lauriola, Ivano
collection	PubMed
description	BACKGROUND: Named Entity Recognition is a common task in Natural Language Processing applications, whose purpose is to recognize named entities in textual documents. Several systems exist to solve this task in the biomedical domain, based on Natural Language Processing techniques and Machine Learning algorithms. A crucial step of these applications is the choice of the representation which describes data. Several representations have been proposed in the literature, some of which are based on a strong knowledge of the domain, and they consist of features manually defined by domain experts. Usually, these representations describe the problem well, but they require a lot of human effort and annotated data. On the other hand, general-purpose representations like word-embeddings do not require human domain knowledge, but they could be too general for a specific task. RESULTS: This paper investigates methods to learn the best representation from data directly, by combining several knowledge-based representations and word embeddings. Two mechanisms have been considered to perform the combination, which are neural networks and Multiple Kernel Learning. To this end, we use a hybrid architecture for biomedical entity recognition which integrates dictionary look-up (also known as gazetteers) with machine learning techniques. Results on the CRAFT corpus clearly show the benefits of the proposed algorithm in terms of F(1) score. CONCLUSIONS: Our experiments show that the principled combination of general, domain specific, word-, and character-level representations improves the performance of entity recognition. We also discussed the contribution of each representation in the final solution.
format	Online Article Text
id	pubmed-8127187
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-81271872021-05-17 Learning adaptive representations for entity recognition in the biomedical domain Lauriola, Ivano Aiolli, Fabio Lavelli, Alberto Rinaldi, Fabio J Biomed Semantics Research BACKGROUND: Named Entity Recognition is a common task in Natural Language Processing applications, whose purpose is to recognize named entities in textual documents. Several systems exist to solve this task in the biomedical domain, based on Natural Language Processing techniques and Machine Learning algorithms. A crucial step of these applications is the choice of the representation which describes data. Several representations have been proposed in the literature, some of which are based on a strong knowledge of the domain, and they consist of features manually defined by domain experts. Usually, these representations describe the problem well, but they require a lot of human effort and annotated data. On the other hand, general-purpose representations like word-embeddings do not require human domain knowledge, but they could be too general for a specific task. RESULTS: This paper investigates methods to learn the best representation from data directly, by combining several knowledge-based representations and word embeddings. Two mechanisms have been considered to perform the combination, which are neural networks and Multiple Kernel Learning. To this end, we use a hybrid architecture for biomedical entity recognition which integrates dictionary look-up (also known as gazetteers) with machine learning techniques. Results on the CRAFT corpus clearly show the benefits of the proposed algorithm in terms of F(1) score. CONCLUSIONS: Our experiments show that the principled combination of general, domain specific, word-, and character-level representations improves the performance of entity recognition. We also discussed the contribution of each representation in the final solution. BioMed Central 2021-05-17 /pmc/articles/PMC8127187/ /pubmed/34001263 http://dx.doi.org/10.1186/s13326-021-00238-0 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Lauriola, Ivano Aiolli, Fabio Lavelli, Alberto Rinaldi, Fabio Learning adaptive representations for entity recognition in the biomedical domain
title	Learning adaptive representations for entity recognition in the biomedical domain
title_full	Learning adaptive representations for entity recognition in the biomedical domain
title_fullStr	Learning adaptive representations for entity recognition in the biomedical domain
title_full_unstemmed	Learning adaptive representations for entity recognition in the biomedical domain
title_short	Learning adaptive representations for entity recognition in the biomedical domain
title_sort	learning adaptive representations for entity recognition in the biomedical domain
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8127187/ https://www.ncbi.nlm.nih.gov/pubmed/34001263 http://dx.doi.org/10.1186/s13326-021-00238-0
work_keys_str_mv	AT lauriolaivano learningadaptiverepresentationsforentityrecognitioninthebiomedicaldomain AT aiollifabio learningadaptiverepresentationsforentityrecognitioninthebiomedicaldomain AT lavellialberto learningadaptiverepresentationsforentityrecognitioninthebiomedicaldomain AT rinaldifabio learningadaptiverepresentationsforentityrecognitioninthebiomedicaldomain

Learning adaptive representations for entity recognition in the biomedical domain

Ejemplares similares