Cargando…

Linking entities through an ontology using word embeddings and syntactic re-ranking

BACKGROUND: Although there is an enormous number of textual resources in the biomedical domain, currently, manually curated resources cover only a small part of the existing knowledge. The vast majority of these information is in unstructured form which contain nonstandard naming conventions. The ta...

Descripción completa

Detalles Bibliográficos
Autores principales: Karadeniz, İlknur, Özgür, Arzucan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6437991/
https://www.ncbi.nlm.nih.gov/pubmed/30917789
http://dx.doi.org/10.1186/s12859-019-2678-8
_version_ 1783407035711225856
author Karadeniz, İlknur
Özgür, Arzucan
author_facet Karadeniz, İlknur
Özgür, Arzucan
author_sort Karadeniz, İlknur
collection PubMed
description BACKGROUND: Although there is an enormous number of textual resources in the biomedical domain, currently, manually curated resources cover only a small part of the existing knowledge. The vast majority of these information is in unstructured form which contain nonstandard naming conventions. The task of named entity recognition, which is the identification of entity names from text, is not adequate without a standardization step. Linking each identified entity mention in text to an ontology/dictionary concept is an essential task to make sense of the identified entities. This paper presents an unsupervised approach for the linking of named entities to concepts in an ontology/dictionary. We propose an approach for the normalization of biomedical entities through an ontology/dictionary by using word embeddings to represent semantic spaces, and a syntactic parser to give higher weight to the most informative word in the named entity mentions. RESULTS: We applied the proposed method to two different normalization tasks: the normalization of bacteria biotope entities through the Onto-Biotope ontology and the normalization of adverse drug reaction entities through the Medical Dictionary for Regulatory Activities (MedDRA). The proposed method achieved a precision score of 65.9%, which is 2.9 percentage points above the state-of-the-art result on the BioNLP Shared Task 2016 Bacteria Biotope test data and a macro-averaged precision score of 68.7% on the Text Analysis Conference 2017 Adverse Drug Reaction test data. CONCLUSIONS: The core contribution of this paper is a syntax-based way of combining the individual word vectors to form vectors for the named entity mentions and ontology concepts, which can then be used to measure the similarity between them. The proposed approach is unsupervised and does not require labeled data, making it easily applicable to different domains.
format Online
Article
Text
id pubmed-6437991
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-64379912019-04-08 Linking entities through an ontology using word embeddings and syntactic re-ranking Karadeniz, İlknur Özgür, Arzucan BMC Bioinformatics Research Article BACKGROUND: Although there is an enormous number of textual resources in the biomedical domain, currently, manually curated resources cover only a small part of the existing knowledge. The vast majority of these information is in unstructured form which contain nonstandard naming conventions. The task of named entity recognition, which is the identification of entity names from text, is not adequate without a standardization step. Linking each identified entity mention in text to an ontology/dictionary concept is an essential task to make sense of the identified entities. This paper presents an unsupervised approach for the linking of named entities to concepts in an ontology/dictionary. We propose an approach for the normalization of biomedical entities through an ontology/dictionary by using word embeddings to represent semantic spaces, and a syntactic parser to give higher weight to the most informative word in the named entity mentions. RESULTS: We applied the proposed method to two different normalization tasks: the normalization of bacteria biotope entities through the Onto-Biotope ontology and the normalization of adverse drug reaction entities through the Medical Dictionary for Regulatory Activities (MedDRA). The proposed method achieved a precision score of 65.9%, which is 2.9 percentage points above the state-of-the-art result on the BioNLP Shared Task 2016 Bacteria Biotope test data and a macro-averaged precision score of 68.7% on the Text Analysis Conference 2017 Adverse Drug Reaction test data. CONCLUSIONS: The core contribution of this paper is a syntax-based way of combining the individual word vectors to form vectors for the named entity mentions and ontology concepts, which can then be used to measure the similarity between them. The proposed approach is unsupervised and does not require labeled data, making it easily applicable to different domains. BioMed Central 2019-03-27 /pmc/articles/PMC6437991/ /pubmed/30917789 http://dx.doi.org/10.1186/s12859-019-2678-8 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Karadeniz, İlknur
Özgür, Arzucan
Linking entities through an ontology using word embeddings and syntactic re-ranking
title Linking entities through an ontology using word embeddings and syntactic re-ranking
title_full Linking entities through an ontology using word embeddings and syntactic re-ranking
title_fullStr Linking entities through an ontology using word embeddings and syntactic re-ranking
title_full_unstemmed Linking entities through an ontology using word embeddings and syntactic re-ranking
title_short Linking entities through an ontology using word embeddings and syntactic re-ranking
title_sort linking entities through an ontology using word embeddings and syntactic re-ranking
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6437991/
https://www.ncbi.nlm.nih.gov/pubmed/30917789
http://dx.doi.org/10.1186/s12859-019-2678-8
work_keys_str_mv AT karadenizilknur linkingentitiesthroughanontologyusingwordembeddingsandsyntacticreranking
AT ozgurarzucan linkingentitiesthroughanontologyusingwordembeddingsandsyntacticreranking