Cargando…

Normalizing biomedical terms by minimizing ambiguity and variability

BACKGROUND: One of the difficulties in mapping biomedical named entities, e.g. genes, proteins, chemicals and diseases, to their concept identifiers stems from the potential variability of the terms. Soft string matching is a possible solution to the problem, but its inherent heavy computational cos...

Descripción completa

Detalles Bibliográficos
Autores principales: Tsuruoka, Yoshimasa, McNaught, John, Ananiadou, Sophia
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2352870/
https://www.ncbi.nlm.nih.gov/pubmed/18426547
http://dx.doi.org/10.1186/1471-2105-9-S3-S2
_version_ 1782152859738963968
author Tsuruoka, Yoshimasa
McNaught, John
Ananiadou, Sophia
author_facet Tsuruoka, Yoshimasa
McNaught, John
Ananiadou, Sophia
author_sort Tsuruoka, Yoshimasa
collection PubMed
description BACKGROUND: One of the difficulties in mapping biomedical named entities, e.g. genes, proteins, chemicals and diseases, to their concept identifiers stems from the potential variability of the terms. Soft string matching is a possible solution to the problem, but its inherent heavy computational cost discourages its use when the dictionaries are large or when real time processing is required. A less computationally demanding approach is to normalize the terms by using heuristic rules, which enables us to look up a dictionary in a constant time regardless of its size. The development of good heuristic rules, however, requires extensive knowledge of the terminology in question and thus is the bottleneck of the normalization approach. RESULTS: We present a novel framework for discovering a list of normalization rules from a dictionary in a fully automated manner. The rules are discovered in such a way that they minimize the ambiguity and variability of the terms in the dictionary. We evaluated our algorithm using two large dictionaries: a human gene/protein name dictionary built from BioThesaurus and a disease name dictionary built from UMLS. CONCLUSIONS: The experimental results showed that automatically discovered rules can perform comparably to carefully crafted heuristic rules in term mapping tasks, and the computational overhead of rule application is small enough that a very fast implementation is possible. This work will help improve the performance of term-concept mapping tasks in biomedical information extraction especially when good normalization heuristics for the target terminology are not fully known.
format Text
id pubmed-2352870
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-23528702008-04-29 Normalizing biomedical terms by minimizing ambiguity and variability Tsuruoka, Yoshimasa McNaught, John Ananiadou, Sophia BMC Bioinformatics Proceedings BACKGROUND: One of the difficulties in mapping biomedical named entities, e.g. genes, proteins, chemicals and diseases, to their concept identifiers stems from the potential variability of the terms. Soft string matching is a possible solution to the problem, but its inherent heavy computational cost discourages its use when the dictionaries are large or when real time processing is required. A less computationally demanding approach is to normalize the terms by using heuristic rules, which enables us to look up a dictionary in a constant time regardless of its size. The development of good heuristic rules, however, requires extensive knowledge of the terminology in question and thus is the bottleneck of the normalization approach. RESULTS: We present a novel framework for discovering a list of normalization rules from a dictionary in a fully automated manner. The rules are discovered in such a way that they minimize the ambiguity and variability of the terms in the dictionary. We evaluated our algorithm using two large dictionaries: a human gene/protein name dictionary built from BioThesaurus and a disease name dictionary built from UMLS. CONCLUSIONS: The experimental results showed that automatically discovered rules can perform comparably to carefully crafted heuristic rules in term mapping tasks, and the computational overhead of rule application is small enough that a very fast implementation is possible. This work will help improve the performance of term-concept mapping tasks in biomedical information extraction especially when good normalization heuristics for the target terminology are not fully known. BioMed Central 2008-04-11 /pmc/articles/PMC2352870/ /pubmed/18426547 http://dx.doi.org/10.1186/1471-2105-9-S3-S2 Text en Copyright © 2008 Tsuruoka et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Tsuruoka, Yoshimasa
McNaught, John
Ananiadou, Sophia
Normalizing biomedical terms by minimizing ambiguity and variability
title Normalizing biomedical terms by minimizing ambiguity and variability
title_full Normalizing biomedical terms by minimizing ambiguity and variability
title_fullStr Normalizing biomedical terms by minimizing ambiguity and variability
title_full_unstemmed Normalizing biomedical terms by minimizing ambiguity and variability
title_short Normalizing biomedical terms by minimizing ambiguity and variability
title_sort normalizing biomedical terms by minimizing ambiguity and variability
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2352870/
https://www.ncbi.nlm.nih.gov/pubmed/18426547
http://dx.doi.org/10.1186/1471-2105-9-S3-S2
work_keys_str_mv AT tsuruokayoshimasa normalizingbiomedicaltermsbyminimizingambiguityandvariability
AT mcnaughtjohn normalizingbiomedicaltermsbyminimizingambiguityandvariability
AT ananiadousophia normalizingbiomedicaltermsbyminimizingambiguityandvariability