Cargando…

Quantifying the Impact and Extent of Undocumented Biomedical Synonymy

Synonymous relationships among biomedical terms are extensively annotated within specialized terminologies, implying that synonymy is important for practical computational applications within this field. It remains unclear, however, whether text mining actually benefits from documented synonymy and...

Descripción completa

Detalles Bibliográficos
Autores principales: Blair, David R., Wang, Kanix, Nestorov, Svetlozar, Evans, James A., Rzhetsky, Andrey
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4177665/
https://www.ncbi.nlm.nih.gov/pubmed/25255227
http://dx.doi.org/10.1371/journal.pcbi.1003799
_version_ 1782336808643723264
author Blair, David R.
Wang, Kanix
Nestorov, Svetlozar
Evans, James A.
Rzhetsky, Andrey
author_facet Blair, David R.
Wang, Kanix
Nestorov, Svetlozar
Evans, James A.
Rzhetsky, Andrey
author_sort Blair, David R.
collection PubMed
description Synonymous relationships among biomedical terms are extensively annotated within specialized terminologies, implying that synonymy is important for practical computational applications within this field. It remains unclear, however, whether text mining actually benefits from documented synonymy and whether existing biomedical thesauri provide adequate coverage of these linguistic relationships. In this study, we examine the impact and extent of undocumented synonymy within a very large compendium of biomedical thesauri. First, we demonstrate that missing synonymy has a significant negative impact on named entity normalization, an important problem within the field of biomedical text mining. To estimate the amount synonymy currently missing from thesauri, we develop a probabilistic model for the construction of synonym terminologies that is capable of handling a wide range of potential biases, and we evaluate its performance using the broader domain of near-synonymy among general English words. Our model predicts that over 90% of these relationships are currently undocumented, a result that we support experimentally through “crowd-sourcing.” Finally, we apply our model to biomedical terminologies and predict that they are missing the vast majority (>90%) of the synonymous relationships they intend to document. Overall, our results expose the dramatic incompleteness of current biomedical thesauri and suggest the need for “next-generation,” high-coverage lexical terminologies.
format Online
Article
Text
id pubmed-4177665
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-41776652014-10-02 Quantifying the Impact and Extent of Undocumented Biomedical Synonymy Blair, David R. Wang, Kanix Nestorov, Svetlozar Evans, James A. Rzhetsky, Andrey PLoS Comput Biol Research Article Synonymous relationships among biomedical terms are extensively annotated within specialized terminologies, implying that synonymy is important for practical computational applications within this field. It remains unclear, however, whether text mining actually benefits from documented synonymy and whether existing biomedical thesauri provide adequate coverage of these linguistic relationships. In this study, we examine the impact and extent of undocumented synonymy within a very large compendium of biomedical thesauri. First, we demonstrate that missing synonymy has a significant negative impact on named entity normalization, an important problem within the field of biomedical text mining. To estimate the amount synonymy currently missing from thesauri, we develop a probabilistic model for the construction of synonym terminologies that is capable of handling a wide range of potential biases, and we evaluate its performance using the broader domain of near-synonymy among general English words. Our model predicts that over 90% of these relationships are currently undocumented, a result that we support experimentally through “crowd-sourcing.” Finally, we apply our model to biomedical terminologies and predict that they are missing the vast majority (>90%) of the synonymous relationships they intend to document. Overall, our results expose the dramatic incompleteness of current biomedical thesauri and suggest the need for “next-generation,” high-coverage lexical terminologies. Public Library of Science 2014-09-25 /pmc/articles/PMC4177665/ /pubmed/25255227 http://dx.doi.org/10.1371/journal.pcbi.1003799 Text en © 2014 Blair et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Blair, David R.
Wang, Kanix
Nestorov, Svetlozar
Evans, James A.
Rzhetsky, Andrey
Quantifying the Impact and Extent of Undocumented Biomedical Synonymy
title Quantifying the Impact and Extent of Undocumented Biomedical Synonymy
title_full Quantifying the Impact and Extent of Undocumented Biomedical Synonymy
title_fullStr Quantifying the Impact and Extent of Undocumented Biomedical Synonymy
title_full_unstemmed Quantifying the Impact and Extent of Undocumented Biomedical Synonymy
title_short Quantifying the Impact and Extent of Undocumented Biomedical Synonymy
title_sort quantifying the impact and extent of undocumented biomedical synonymy
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4177665/
https://www.ncbi.nlm.nih.gov/pubmed/25255227
http://dx.doi.org/10.1371/journal.pcbi.1003799
work_keys_str_mv AT blairdavidr quantifyingtheimpactandextentofundocumentedbiomedicalsynonymy
AT wangkanix quantifyingtheimpactandextentofundocumentedbiomedicalsynonymy
AT nestorovsvetlozar quantifyingtheimpactandextentofundocumentedbiomedicalsynonymy
AT evansjamesa quantifyingtheimpactandextentofundocumentedbiomedicalsynonymy
AT rzhetskyandrey quantifyingtheimpactandextentofundocumentedbiomedicalsynonymy