Cargando…

A comparison study on algorithms of detecting long forms for short forms in biomedical text

MOTIVATION: With more and more research dedicated to literature mining in the biomedical domain, more and more systems are available for people to choose from when building literature mining applications. In this study, we focus on one specific kind of literature mining task, i.e., detecting definit...

Descripción completa

Detalles Bibliográficos
Autores principales: Torii, Manabu, Hu, Zhang-zhi, Song, Min, Wu, Cathy H, Liu, Hongfang
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2217663/
https://www.ncbi.nlm.nih.gov/pubmed/18047706
http://dx.doi.org/10.1186/1471-2105-8-S9-S5
_version_ 1782149295610265600
author Torii, Manabu
Hu, Zhang-zhi
Song, Min
Wu, Cathy H
Liu, Hongfang
author_facet Torii, Manabu
Hu, Zhang-zhi
Song, Min
Wu, Cathy H
Liu, Hongfang
author_sort Torii, Manabu
collection PubMed
description MOTIVATION: With more and more research dedicated to literature mining in the biomedical domain, more and more systems are available for people to choose from when building literature mining applications. In this study, we focus on one specific kind of literature mining task, i.e., detecting definitions of acronyms, abbreviations, and symbols in biomedical text. We denote acronyms, abbreviations, and symbols as short forms (SFs) and their corresponding definitions as long forms (LFs). The study was designed to answer the following questions; i) how well a system performs in detecting LFs from novel text, ii) what the coverage is for various terminological knowledge bases in including SFs as synonyms of their LFs, and iii) how to combine results from various SF knowledge bases. METHOD: We evaluated the following three publicly available detection systems in detecting LFs for SFs: i) a handcrafted pattern/rule based system by Ao and Takagi, ALICE, ii) a machine learning system by Chang et al., and iii) a simple alignment-based program by Schwartz and Hearst. In addition, we investigated the conceptual coverage of two terminological knowledge bases: i) the UMLS (the Unified Medical Language System), and ii) the BioThesaurus (a thesaurus of names for all UniProt protein records). We also implemented a web interface that provides a virtual integration of various SF knowledge bases. RESULTS: We found that detection systems agree with each other on most cases, and the existing terminological knowledge bases have a good coverage of synonymous relationship for frequently defined LFs. The web interface allows people to detect SF definitions from text and to search several SF knowledge bases. AVAILABILITY: The web site is .
format Text
id pubmed-2217663
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-22176632008-01-31 A comparison study on algorithms of detecting long forms for short forms in biomedical text Torii, Manabu Hu, Zhang-zhi Song, Min Wu, Cathy H Liu, Hongfang BMC Bioinformatics Proceedings MOTIVATION: With more and more research dedicated to literature mining in the biomedical domain, more and more systems are available for people to choose from when building literature mining applications. In this study, we focus on one specific kind of literature mining task, i.e., detecting definitions of acronyms, abbreviations, and symbols in biomedical text. We denote acronyms, abbreviations, and symbols as short forms (SFs) and their corresponding definitions as long forms (LFs). The study was designed to answer the following questions; i) how well a system performs in detecting LFs from novel text, ii) what the coverage is for various terminological knowledge bases in including SFs as synonyms of their LFs, and iii) how to combine results from various SF knowledge bases. METHOD: We evaluated the following three publicly available detection systems in detecting LFs for SFs: i) a handcrafted pattern/rule based system by Ao and Takagi, ALICE, ii) a machine learning system by Chang et al., and iii) a simple alignment-based program by Schwartz and Hearst. In addition, we investigated the conceptual coverage of two terminological knowledge bases: i) the UMLS (the Unified Medical Language System), and ii) the BioThesaurus (a thesaurus of names for all UniProt protein records). We also implemented a web interface that provides a virtual integration of various SF knowledge bases. RESULTS: We found that detection systems agree with each other on most cases, and the existing terminological knowledge bases have a good coverage of synonymous relationship for frequently defined LFs. The web interface allows people to detect SF definitions from text and to search several SF knowledge bases. AVAILABILITY: The web site is . BioMed Central 2007-11-27 /pmc/articles/PMC2217663/ /pubmed/18047706 http://dx.doi.org/10.1186/1471-2105-8-S9-S5 Text en Copyright © 2007 Torii et al; licensee BioMed Central Ltd.
spellingShingle Proceedings
Torii, Manabu
Hu, Zhang-zhi
Song, Min
Wu, Cathy H
Liu, Hongfang
A comparison study on algorithms of detecting long forms for short forms in biomedical text
title A comparison study on algorithms of detecting long forms for short forms in biomedical text
title_full A comparison study on algorithms of detecting long forms for short forms in biomedical text
title_fullStr A comparison study on algorithms of detecting long forms for short forms in biomedical text
title_full_unstemmed A comparison study on algorithms of detecting long forms for short forms in biomedical text
title_short A comparison study on algorithms of detecting long forms for short forms in biomedical text
title_sort comparison study on algorithms of detecting long forms for short forms in biomedical text
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2217663/
https://www.ncbi.nlm.nih.gov/pubmed/18047706
http://dx.doi.org/10.1186/1471-2105-8-S9-S5
work_keys_str_mv AT toriimanabu acomparisonstudyonalgorithmsofdetectinglongformsforshortformsinbiomedicaltext
AT huzhangzhi acomparisonstudyonalgorithmsofdetectinglongformsforshortformsinbiomedicaltext
AT songmin acomparisonstudyonalgorithmsofdetectinglongformsforshortformsinbiomedicaltext
AT wucathyh acomparisonstudyonalgorithmsofdetectinglongformsforshortformsinbiomedicaltext
AT liuhongfang acomparisonstudyonalgorithmsofdetectinglongformsforshortformsinbiomedicaltext
AT toriimanabu comparisonstudyonalgorithmsofdetectinglongformsforshortformsinbiomedicaltext
AT huzhangzhi comparisonstudyonalgorithmsofdetectinglongformsforshortformsinbiomedicaltext
AT songmin comparisonstudyonalgorithmsofdetectinglongformsforshortformsinbiomedicaltext
AT wucathyh comparisonstudyonalgorithmsofdetectinglongformsforshortformsinbiomedicaltext
AT liuhongfang comparisonstudyonalgorithmsofdetectinglongformsforshortformsinbiomedicaltext