Cargando…

Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data

Entity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy...

Descripción completa

Detalles Bibliográficos
Autores principales: Ferré, Arnaud, Ba, Mouhamadou, Bossy, Robert
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Korea Genome Organization 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6808633/
https://www.ncbi.nlm.nih.gov/pubmed/31307135
http://dx.doi.org/10.5808/GI.2019.17.2.e20
_version_ 1783461782498574336
author Ferré, Arnaud
Ba, Mouhamadou
Bossy, Robert
author_facet Ferré, Arnaud
Ba, Mouhamadou
Bossy, Robert
author_sort Ferré, Arnaud
collection PubMed
description Entity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy of terms, which captures knowledge of a domain. Presently, machine-learning methods, often coupled with distributional representations, achieve good performance. However, these require large training datasets, which are not always available, especially for tasks in specialized domains. CONTES (CONcept-TErm System) is a supervised method that addresses entity normalization with ontology concepts using small training datasets. CONTES has some limitations, such as it does not scale well with very large ontologies, it tends to overgeneralize predictions, and it lacks valid representations for the out-of-vocabulary words. Here, we propose to assess different methods to reduce the dimensionality in the representation of the ontology. We also propose to calibrate parameters in order to make the predictions more accurate, and to address the problem of out-of-vocabulary words, with a specific method.
format Online
Article
Text
id pubmed-6808633
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Korea Genome Organization
record_format MEDLINE/PubMed
spelling pubmed-68086332019-10-30 Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data Ferré, Arnaud Ba, Mouhamadou Bossy, Robert Genomics Inform Application Note Entity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy of terms, which captures knowledge of a domain. Presently, machine-learning methods, often coupled with distributional representations, achieve good performance. However, these require large training datasets, which are not always available, especially for tasks in specialized domains. CONTES (CONcept-TErm System) is a supervised method that addresses entity normalization with ontology concepts using small training datasets. CONTES has some limitations, such as it does not scale well with very large ontologies, it tends to overgeneralize predictions, and it lacks valid representations for the out-of-vocabulary words. Here, we propose to assess different methods to reduce the dimensionality in the representation of the ontology. We also propose to calibrate parameters in order to make the predictions more accurate, and to address the problem of out-of-vocabulary words, with a specific method. Korea Genome Organization 2019-06-27 /pmc/articles/PMC6808633/ /pubmed/31307135 http://dx.doi.org/10.5808/GI.2019.17.2.e20 Text en (c) 2019, Korea Genome Organization (CC) This is an open-access article distributed under the terms of the Creative Commons Attribution license(https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Application Note
Ferré, Arnaud
Ba, Mouhamadou
Bossy, Robert
Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data
title Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data
title_full Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data
title_fullStr Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data
title_full_unstemmed Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data
title_short Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data
title_sort improving the contes method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data
topic Application Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6808633/
https://www.ncbi.nlm.nih.gov/pubmed/31307135
http://dx.doi.org/10.5808/GI.2019.17.2.e20
work_keys_str_mv AT ferrearnaud improvingthecontesmethodfornormalizingbiomedicaltextentitieswithconceptsfromanontologywithalmostnotrainingdata
AT bamouhamadou improvingthecontesmethodfornormalizingbiomedicaltextentitieswithconceptsfromanontologywithalmostnotrainingdata
AT bossyrobert improvingthecontesmethodfornormalizingbiomedicaltextentitieswithconceptsfromanontologywithalmostnotrainingdata