Cargando…

Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data

Entity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ferré, Arnaud, Ba, Mouhamadou, Bossy, Robert
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Korea Genome Organization 2019
Materias:	Application Note
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6808633/ https://www.ncbi.nlm.nih.gov/pubmed/31307135 http://dx.doi.org/10.5808/GI.2019.17.2.e20

_version_	1783461782498574336
author	Ferré, Arnaud Ba, Mouhamadou Bossy, Robert
author_facet	Ferré, Arnaud Ba, Mouhamadou Bossy, Robert
author_sort	Ferré, Arnaud
collection	PubMed
description	Entity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy of terms, which captures knowledge of a domain. Presently, machine-learning methods, often coupled with distributional representations, achieve good performance. However, these require large training datasets, which are not always available, especially for tasks in specialized domains. CONTES (CONcept-TErm System) is a supervised method that addresses entity normalization with ontology concepts using small training datasets. CONTES has some limitations, such as it does not scale well with very large ontologies, it tends to overgeneralize predictions, and it lacks valid representations for the out-of-vocabulary words. Here, we propose to assess different methods to reduce the dimensionality in the representation of the ontology. We also propose to calibrate parameters in order to make the predictions more accurate, and to address the problem of out-of-vocabulary words, with a specific method.
format	Online Article Text
id	pubmed-6808633
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Korea Genome Organization
record_format	MEDLINE/PubMed
spelling	pubmed-68086332019-10-30 Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data Ferré, Arnaud Ba, Mouhamadou Bossy, Robert Genomics Inform Application Note Entity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy of terms, which captures knowledge of a domain. Presently, machine-learning methods, often coupled with distributional representations, achieve good performance. However, these require large training datasets, which are not always available, especially for tasks in specialized domains. CONTES (CONcept-TErm System) is a supervised method that addresses entity normalization with ontology concepts using small training datasets. CONTES has some limitations, such as it does not scale well with very large ontologies, it tends to overgeneralize predictions, and it lacks valid representations for the out-of-vocabulary words. Here, we propose to assess different methods to reduce the dimensionality in the representation of the ontology. We also propose to calibrate parameters in order to make the predictions more accurate, and to address the problem of out-of-vocabulary words, with a specific method. Korea Genome Organization 2019-06-27 /pmc/articles/PMC6808633/ /pubmed/31307135 http://dx.doi.org/10.5808/GI.2019.17.2.e20 Text en (c) 2019, Korea Genome Organization (CC) This is an open-access article distributed under the terms of the Creative Commons Attribution license(https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Application Note Ferré, Arnaud Ba, Mouhamadou Bossy, Robert Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data
title	Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data
title_full	Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data
title_fullStr	Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data
title_full_unstemmed	Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data
title_short	Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data
title_sort	improving the contes method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data
topic	Application Note
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6808633/ https://www.ncbi.nlm.nih.gov/pubmed/31307135 http://dx.doi.org/10.5808/GI.2019.17.2.e20
work_keys_str_mv	AT ferrearnaud improvingthecontesmethodfornormalizingbiomedicaltextentitieswithconceptsfromanontologywithalmostnotrainingdata AT bamouhamadou improvingthecontesmethodfornormalizingbiomedicaltextentitieswithconceptsfromanontologywithalmostnotrainingdata AT bossyrobert improvingthecontesmethodfornormalizingbiomedicaltextentitieswithconceptsfromanontologywithalmostnotrainingdata

Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data

Ejemplares similares