Cargando…
BIOADI: a machine learning approach to identifying abbreviations and definitions in biological literature
BACKGROUND: To automatically process large quantities of biological literature for knowledge discovery and information curation, text mining tools are becoming essential. Abbreviation recognition is related to NER and can be considered as a pair recognition task of a terminology and its correspondin...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2788358/ https://www.ncbi.nlm.nih.gov/pubmed/19958517 http://dx.doi.org/10.1186/1471-2105-10-S15-S7 |
_version_ | 1782174963169492992 |
---|---|
author | Kuo, Cheng-Ju Ling, Maurice HT Lin, Kuan-Ting Hsu, Chun-Nan |
author_facet | Kuo, Cheng-Ju Ling, Maurice HT Lin, Kuan-Ting Hsu, Chun-Nan |
author_sort | Kuo, Cheng-Ju |
collection | PubMed |
description | BACKGROUND: To automatically process large quantities of biological literature for knowledge discovery and information curation, text mining tools are becoming essential. Abbreviation recognition is related to NER and can be considered as a pair recognition task of a terminology and its corresponding abbreviation from free text. The successful identification of abbreviation and its corresponding definition is not only a prerequisite to index terms of text databases to produce articles of related interests, but also a building block to improve existing gene mention tagging and gene normalization tools. RESULTS: Our approach to abbreviation recognition (AR) is based on machine-learning, which exploits a novel set of rich features to learn rules from training data. Tested on the AB3P corpus, our system demonstrated a F-score of 89.90% with 95.86% precision at 84.64% recall, higher than the result achieved by the existing best AR performance system. We also annotated a new corpus of 1200 PubMed abstracts which was derived from BioCreative II gene normalization corpus. On our annotated corpus, our system achieved a F-score of 86.20% with 93.52% precision at 79.95% recall, which also outperforms all tested systems. CONCLUSION: By applying our system to extract all short form-long form pairs from all available PubMed abstracts, we have constructed BIOADI. Mining BIOADI reveals many interesting trends of bio-medical research. Besides, we also provide an off-line AR software in the download section on http://bioagent.iis.sinica.edu.tw/BIOADI/. |
format | Text |
id | pubmed-2788358 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2009 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-27883582009-12-04 BIOADI: a machine learning approach to identifying abbreviations and definitions in biological literature Kuo, Cheng-Ju Ling, Maurice HT Lin, Kuan-Ting Hsu, Chun-Nan BMC Bioinformatics Proceedings BACKGROUND: To automatically process large quantities of biological literature for knowledge discovery and information curation, text mining tools are becoming essential. Abbreviation recognition is related to NER and can be considered as a pair recognition task of a terminology and its corresponding abbreviation from free text. The successful identification of abbreviation and its corresponding definition is not only a prerequisite to index terms of text databases to produce articles of related interests, but also a building block to improve existing gene mention tagging and gene normalization tools. RESULTS: Our approach to abbreviation recognition (AR) is based on machine-learning, which exploits a novel set of rich features to learn rules from training data. Tested on the AB3P corpus, our system demonstrated a F-score of 89.90% with 95.86% precision at 84.64% recall, higher than the result achieved by the existing best AR performance system. We also annotated a new corpus of 1200 PubMed abstracts which was derived from BioCreative II gene normalization corpus. On our annotated corpus, our system achieved a F-score of 86.20% with 93.52% precision at 79.95% recall, which also outperforms all tested systems. CONCLUSION: By applying our system to extract all short form-long form pairs from all available PubMed abstracts, we have constructed BIOADI. Mining BIOADI reveals many interesting trends of bio-medical research. Besides, we also provide an off-line AR software in the download section on http://bioagent.iis.sinica.edu.tw/BIOADI/. BioMed Central 2009-12-03 /pmc/articles/PMC2788358/ /pubmed/19958517 http://dx.doi.org/10.1186/1471-2105-10-S15-S7 Text en Copyright © 2009 Kuo et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Kuo, Cheng-Ju Ling, Maurice HT Lin, Kuan-Ting Hsu, Chun-Nan BIOADI: a machine learning approach to identifying abbreviations and definitions in biological literature |
title | BIOADI: a machine learning approach to identifying abbreviations and definitions in biological literature |
title_full | BIOADI: a machine learning approach to identifying abbreviations and definitions in biological literature |
title_fullStr | BIOADI: a machine learning approach to identifying abbreviations and definitions in biological literature |
title_full_unstemmed | BIOADI: a machine learning approach to identifying abbreviations and definitions in biological literature |
title_short | BIOADI: a machine learning approach to identifying abbreviations and definitions in biological literature |
title_sort | bioadi: a machine learning approach to identifying abbreviations and definitions in biological literature |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2788358/ https://www.ncbi.nlm.nih.gov/pubmed/19958517 http://dx.doi.org/10.1186/1471-2105-10-S15-S7 |
work_keys_str_mv | AT kuochengju bioadiamachinelearningapproachtoidentifyingabbreviationsanddefinitionsinbiologicalliterature AT lingmauriceht bioadiamachinelearningapproachtoidentifyingabbreviationsanddefinitionsinbiologicalliterature AT linkuanting bioadiamachinelearningapproachtoidentifyingabbreviationsanddefinitionsinbiologicalliterature AT hsuchunnan bioadiamachinelearningapproachtoidentifyingabbreviationsanddefinitionsinbiologicalliterature |