Cargando…
Linking entities through an ontology using word embeddings and syntactic re-ranking
BACKGROUND: Although there is an enormous number of textual resources in the biomedical domain, currently, manually curated resources cover only a small part of the existing knowledge. The vast majority of these information is in unstructured form which contain nonstandard naming conventions. The ta...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6437991/ https://www.ncbi.nlm.nih.gov/pubmed/30917789 http://dx.doi.org/10.1186/s12859-019-2678-8 |
_version_ | 1783407035711225856 |
---|---|
author | Karadeniz, İlknur Özgür, Arzucan |
author_facet | Karadeniz, İlknur Özgür, Arzucan |
author_sort | Karadeniz, İlknur |
collection | PubMed |
description | BACKGROUND: Although there is an enormous number of textual resources in the biomedical domain, currently, manually curated resources cover only a small part of the existing knowledge. The vast majority of these information is in unstructured form which contain nonstandard naming conventions. The task of named entity recognition, which is the identification of entity names from text, is not adequate without a standardization step. Linking each identified entity mention in text to an ontology/dictionary concept is an essential task to make sense of the identified entities. This paper presents an unsupervised approach for the linking of named entities to concepts in an ontology/dictionary. We propose an approach for the normalization of biomedical entities through an ontology/dictionary by using word embeddings to represent semantic spaces, and a syntactic parser to give higher weight to the most informative word in the named entity mentions. RESULTS: We applied the proposed method to two different normalization tasks: the normalization of bacteria biotope entities through the Onto-Biotope ontology and the normalization of adverse drug reaction entities through the Medical Dictionary for Regulatory Activities (MedDRA). The proposed method achieved a precision score of 65.9%, which is 2.9 percentage points above the state-of-the-art result on the BioNLP Shared Task 2016 Bacteria Biotope test data and a macro-averaged precision score of 68.7% on the Text Analysis Conference 2017 Adverse Drug Reaction test data. CONCLUSIONS: The core contribution of this paper is a syntax-based way of combining the individual word vectors to form vectors for the named entity mentions and ontology concepts, which can then be used to measure the similarity between them. The proposed approach is unsupervised and does not require labeled data, making it easily applicable to different domains. |
format | Online Article Text |
id | pubmed-6437991 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-64379912019-04-08 Linking entities through an ontology using word embeddings and syntactic re-ranking Karadeniz, İlknur Özgür, Arzucan BMC Bioinformatics Research Article BACKGROUND: Although there is an enormous number of textual resources in the biomedical domain, currently, manually curated resources cover only a small part of the existing knowledge. The vast majority of these information is in unstructured form which contain nonstandard naming conventions. The task of named entity recognition, which is the identification of entity names from text, is not adequate without a standardization step. Linking each identified entity mention in text to an ontology/dictionary concept is an essential task to make sense of the identified entities. This paper presents an unsupervised approach for the linking of named entities to concepts in an ontology/dictionary. We propose an approach for the normalization of biomedical entities through an ontology/dictionary by using word embeddings to represent semantic spaces, and a syntactic parser to give higher weight to the most informative word in the named entity mentions. RESULTS: We applied the proposed method to two different normalization tasks: the normalization of bacteria biotope entities through the Onto-Biotope ontology and the normalization of adverse drug reaction entities through the Medical Dictionary for Regulatory Activities (MedDRA). The proposed method achieved a precision score of 65.9%, which is 2.9 percentage points above the state-of-the-art result on the BioNLP Shared Task 2016 Bacteria Biotope test data and a macro-averaged precision score of 68.7% on the Text Analysis Conference 2017 Adverse Drug Reaction test data. CONCLUSIONS: The core contribution of this paper is a syntax-based way of combining the individual word vectors to form vectors for the named entity mentions and ontology concepts, which can then be used to measure the similarity between them. The proposed approach is unsupervised and does not require labeled data, making it easily applicable to different domains. BioMed Central 2019-03-27 /pmc/articles/PMC6437991/ /pubmed/30917789 http://dx.doi.org/10.1186/s12859-019-2678-8 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Karadeniz, İlknur Özgür, Arzucan Linking entities through an ontology using word embeddings and syntactic re-ranking |
title | Linking entities through an ontology using word embeddings and syntactic re-ranking |
title_full | Linking entities through an ontology using word embeddings and syntactic re-ranking |
title_fullStr | Linking entities through an ontology using word embeddings and syntactic re-ranking |
title_full_unstemmed | Linking entities through an ontology using word embeddings and syntactic re-ranking |
title_short | Linking entities through an ontology using word embeddings and syntactic re-ranking |
title_sort | linking entities through an ontology using word embeddings and syntactic re-ranking |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6437991/ https://www.ncbi.nlm.nih.gov/pubmed/30917789 http://dx.doi.org/10.1186/s12859-019-2678-8 |
work_keys_str_mv | AT karadenizilknur linkingentitiesthroughanontologyusingwordembeddingsandsyntacticreranking AT ozgurarzucan linkingentitiesthroughanontologyusingwordembeddingsandsyntacticreranking |