Cargando…
Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data
Entity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Korea Genome Organization
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6808633/ https://www.ncbi.nlm.nih.gov/pubmed/31307135 http://dx.doi.org/10.5808/GI.2019.17.2.e20 |
_version_ | 1783461782498574336 |
---|---|
author | Ferré, Arnaud Ba, Mouhamadou Bossy, Robert |
author_facet | Ferré, Arnaud Ba, Mouhamadou Bossy, Robert |
author_sort | Ferré, Arnaud |
collection | PubMed |
description | Entity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy of terms, which captures knowledge of a domain. Presently, machine-learning methods, often coupled with distributional representations, achieve good performance. However, these require large training datasets, which are not always available, especially for tasks in specialized domains. CONTES (CONcept-TErm System) is a supervised method that addresses entity normalization with ontology concepts using small training datasets. CONTES has some limitations, such as it does not scale well with very large ontologies, it tends to overgeneralize predictions, and it lacks valid representations for the out-of-vocabulary words. Here, we propose to assess different methods to reduce the dimensionality in the representation of the ontology. We also propose to calibrate parameters in order to make the predictions more accurate, and to address the problem of out-of-vocabulary words, with a specific method. |
format | Online Article Text |
id | pubmed-6808633 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Korea Genome Organization |
record_format | MEDLINE/PubMed |
spelling | pubmed-68086332019-10-30 Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data Ferré, Arnaud Ba, Mouhamadou Bossy, Robert Genomics Inform Application Note Entity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy of terms, which captures knowledge of a domain. Presently, machine-learning methods, often coupled with distributional representations, achieve good performance. However, these require large training datasets, which are not always available, especially for tasks in specialized domains. CONTES (CONcept-TErm System) is a supervised method that addresses entity normalization with ontology concepts using small training datasets. CONTES has some limitations, such as it does not scale well with very large ontologies, it tends to overgeneralize predictions, and it lacks valid representations for the out-of-vocabulary words. Here, we propose to assess different methods to reduce the dimensionality in the representation of the ontology. We also propose to calibrate parameters in order to make the predictions more accurate, and to address the problem of out-of-vocabulary words, with a specific method. Korea Genome Organization 2019-06-27 /pmc/articles/PMC6808633/ /pubmed/31307135 http://dx.doi.org/10.5808/GI.2019.17.2.e20 Text en (c) 2019, Korea Genome Organization (CC) This is an open-access article distributed under the terms of the Creative Commons Attribution license(https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Application Note Ferré, Arnaud Ba, Mouhamadou Bossy, Robert Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data |
title | Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data |
title_full | Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data |
title_fullStr | Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data |
title_full_unstemmed | Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data |
title_short | Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data |
title_sort | improving the contes method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data |
topic | Application Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6808633/ https://www.ncbi.nlm.nih.gov/pubmed/31307135 http://dx.doi.org/10.5808/GI.2019.17.2.e20 |
work_keys_str_mv | AT ferrearnaud improvingthecontesmethodfornormalizingbiomedicaltextentitieswithconceptsfromanontologywithalmostnotrainingdata AT bamouhamadou improvingthecontesmethodfornormalizingbiomedicaltextentitieswithconceptsfromanontologywithalmostnotrainingdata AT bossyrobert improvingthecontesmethodfornormalizingbiomedicaltextentitieswithconceptsfromanontologywithalmostnotrainingdata |