Cargando…

Exploiting domain information for Word Sense Disambiguation of medical documents

OBJECTIVE: Current techniques for knowledge-based Word Sense Disambiguation (WSD) of ambiguous biomedical terms rely on relations in the Unified Medical Language System Metathesaurus but do not take into account the domain of the target documents. The authors' goal is to improve these methods b...

Descripción completa

Detalles Bibliográficos
Autores principales: Stevenson, Mark, Agirre, Eneko, Soroa, Aitor
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BMJ Group 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3277615/
https://www.ncbi.nlm.nih.gov/pubmed/21900701
http://dx.doi.org/10.1136/amiajnl-2011-000415
_version_ 1782223507826933760
author Stevenson, Mark
Agirre, Eneko
Soroa, Aitor
author_facet Stevenson, Mark
Agirre, Eneko
Soroa, Aitor
author_sort Stevenson, Mark
collection PubMed
description OBJECTIVE: Current techniques for knowledge-based Word Sense Disambiguation (WSD) of ambiguous biomedical terms rely on relations in the Unified Medical Language System Metathesaurus but do not take into account the domain of the target documents. The authors' goal is to improve these methods by using information about the topic of the document in which the ambiguous term appears. DESIGN: The authors proposed and implemented several methods to extract lists of key terms associated with Medical Subject Heading terms. These key terms are used to represent the document topic in a knowledge-based WSD system. They are applied both alone and in combination with local context. MEASUREMENTS: A standard measure of accuracy was calculated over the set of target words in the widely used National Library of Medicine WSD dataset. RESULTS AND DISCUSSION: The authors report a significant improvement when combining those key terms with local context, showing that domain information improves the results of a WSD system based on the Unified Medical Language System Metathesaurus alone. The best results were obtained using key terms obtained by relevance feedback and weighted by inverse document frequency.
format Online
Article
Text
id pubmed-3277615
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BMJ Group
record_format MEDLINE/PubMed
spelling pubmed-32776152012-02-13 Exploiting domain information for Word Sense Disambiguation of medical documents Stevenson, Mark Agirre, Eneko Soroa, Aitor J Am Med Inform Assoc Research and Applications OBJECTIVE: Current techniques for knowledge-based Word Sense Disambiguation (WSD) of ambiguous biomedical terms rely on relations in the Unified Medical Language System Metathesaurus but do not take into account the domain of the target documents. The authors' goal is to improve these methods by using information about the topic of the document in which the ambiguous term appears. DESIGN: The authors proposed and implemented several methods to extract lists of key terms associated with Medical Subject Heading terms. These key terms are used to represent the document topic in a knowledge-based WSD system. They are applied both alone and in combination with local context. MEASUREMENTS: A standard measure of accuracy was calculated over the set of target words in the widely used National Library of Medicine WSD dataset. RESULTS AND DISCUSSION: The authors report a significant improvement when combining those key terms with local context, showing that domain information improves the results of a WSD system based on the Unified Medical Language System Metathesaurus alone. The best results were obtained using key terms obtained by relevance feedback and weighted by inverse document frequency. BMJ Group 2011-09-07 2012 /pmc/articles/PMC3277615/ /pubmed/21900701 http://dx.doi.org/10.1136/amiajnl-2011-000415 Text en © 2012, Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.
spellingShingle Research and Applications
Stevenson, Mark
Agirre, Eneko
Soroa, Aitor
Exploiting domain information for Word Sense Disambiguation of medical documents
title Exploiting domain information for Word Sense Disambiguation of medical documents
title_full Exploiting domain information for Word Sense Disambiguation of medical documents
title_fullStr Exploiting domain information for Word Sense Disambiguation of medical documents
title_full_unstemmed Exploiting domain information for Word Sense Disambiguation of medical documents
title_short Exploiting domain information for Word Sense Disambiguation of medical documents
title_sort exploiting domain information for word sense disambiguation of medical documents
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3277615/
https://www.ncbi.nlm.nih.gov/pubmed/21900701
http://dx.doi.org/10.1136/amiajnl-2011-000415
work_keys_str_mv AT stevensonmark exploitingdomaininformationforwordsensedisambiguationofmedicaldocuments
AT agirreeneko exploitingdomaininformationforwordsensedisambiguationofmedicaldocuments
AT soroaaitor exploitingdomaininformationforwordsensedisambiguationofmedicaldocuments