Cargando…

Supervised Learning and Knowledge-Based Approaches Applied to Biomedical Word Sense Disambiguation

Word sense disambiguation (WSD) is an important step in biomedical text mining, which is responsible for assigning an unequivocal concept to an ambiguous term, improving the accuracy of biomedical information extraction systems. In this work we followed supervised and knowledge-based disambiguation...

Descripción completa

Detalles Bibliográficos
Autores principales: Antunes, Rui, Matos, Sérgio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: De Gruyter 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6042812/
https://www.ncbi.nlm.nih.gov/pubmed/29236676
http://dx.doi.org/10.1515/jib-2017-0051
_version_ 1783339226133168128
author Antunes, Rui
Matos, Sérgio
author_facet Antunes, Rui
Matos, Sérgio
author_sort Antunes, Rui
collection PubMed
description Word sense disambiguation (WSD) is an important step in biomedical text mining, which is responsible for assigning an unequivocal concept to an ambiguous term, improving the accuracy of biomedical information extraction systems. In this work we followed supervised and knowledge-based disambiguation approaches, with the best results obtained by supervised means. In the supervised method we used bag-of-words as local features, and word embeddings as global features. In the knowledge-based method we combined word embeddings, concept textual definitions extracted from the UMLS database, and concept association values calculated from the MeSH co-occurrence counts from MEDLINE articles. Also, in the knowledge-based method, we tested different word embedding averaging functions to calculate the surrounding context vectors, with the goal to give more importance to closest words of the ambiguous term. The MSH WSD dataset, the most common dataset used for evaluating biomedical concept disambiguation, was used to evaluate our methods. We obtained a top accuracy of 95.6 % by supervised means, while the best knowledge-based accuracy was 87.4 %. Our results show that word embedding models improved the disambiguation accuracy, proving to be a powerful resource in the WSD task.
format Online
Article
Text
id pubmed-6042812
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher De Gruyter
record_format MEDLINE/PubMed
spelling pubmed-60428122019-01-28 Supervised Learning and Knowledge-Based Approaches Applied to Biomedical Word Sense Disambiguation Antunes, Rui Matos, Sérgio J Integr Bioinform Original Articles Word sense disambiguation (WSD) is an important step in biomedical text mining, which is responsible for assigning an unequivocal concept to an ambiguous term, improving the accuracy of biomedical information extraction systems. In this work we followed supervised and knowledge-based disambiguation approaches, with the best results obtained by supervised means. In the supervised method we used bag-of-words as local features, and word embeddings as global features. In the knowledge-based method we combined word embeddings, concept textual definitions extracted from the UMLS database, and concept association values calculated from the MeSH co-occurrence counts from MEDLINE articles. Also, in the knowledge-based method, we tested different word embedding averaging functions to calculate the surrounding context vectors, with the goal to give more importance to closest words of the ambiguous term. The MSH WSD dataset, the most common dataset used for evaluating biomedical concept disambiguation, was used to evaluate our methods. We obtained a top accuracy of 95.6 % by supervised means, while the best knowledge-based accuracy was 87.4 %. Our results show that word embedding models improved the disambiguation accuracy, proving to be a powerful resource in the WSD task. De Gruyter 2017-12-13 /pmc/articles/PMC6042812/ /pubmed/29236676 http://dx.doi.org/10.1515/jib-2017-0051 Text en ©2017, Rui Antunes and Sérgio Matos, published by DeGruyter, Berlin/Boston http://creativecommons.org/licenses/by-nc-nd/3.0 This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.
spellingShingle Original Articles
Antunes, Rui
Matos, Sérgio
Supervised Learning and Knowledge-Based Approaches Applied to Biomedical Word Sense Disambiguation
title Supervised Learning and Knowledge-Based Approaches Applied to Biomedical Word Sense Disambiguation
title_full Supervised Learning and Knowledge-Based Approaches Applied to Biomedical Word Sense Disambiguation
title_fullStr Supervised Learning and Knowledge-Based Approaches Applied to Biomedical Word Sense Disambiguation
title_full_unstemmed Supervised Learning and Knowledge-Based Approaches Applied to Biomedical Word Sense Disambiguation
title_short Supervised Learning and Knowledge-Based Approaches Applied to Biomedical Word Sense Disambiguation
title_sort supervised learning and knowledge-based approaches applied to biomedical word sense disambiguation
topic Original Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6042812/
https://www.ncbi.nlm.nih.gov/pubmed/29236676
http://dx.doi.org/10.1515/jib-2017-0051
work_keys_str_mv AT antunesrui supervisedlearningandknowledgebasedapproachesappliedtobiomedicalwordsensedisambiguation
AT matossergio supervisedlearningandknowledgebasedapproachesappliedtobiomedicalwordsensedisambiguation