Cargando…

Machine learning with naturally labeled data for identifying abbreviation definitions

BACKGROUND: The rapid growth of biomedical literature requires accurate text analysis and text processing tools. Detecting abbreviations and identifying their definitions is an important component of such tools. Most existing approaches for the abbreviation definition identification task employ rule...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yeganova, Lana, Comeau, Donald C, Wilbur, W John
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3111592/ https://www.ncbi.nlm.nih.gov/pubmed/21658293 http://dx.doi.org/10.1186/1471-2105-12-S3-S6

_version_	1782205650461261824
author	Yeganova, Lana Comeau, Donald C Wilbur, W John
author_facet	Yeganova, Lana Comeau, Donald C Wilbur, W John
author_sort	Yeganova, Lana
collection	PubMed
description	BACKGROUND: The rapid growth of biomedical literature requires accurate text analysis and text processing tools. Detecting abbreviations and identifying their definitions is an important component of such tools. Most existing approaches for the abbreviation definition identification task employ rule-based methods. While achieving high precision, rule-based methods are limited to the rules defined and fail to capture many uncommon definition patterns. Supervised learning techniques, which offer more flexibility in detecting abbreviation definitions, have also been applied to the problem. However, they require manually labeled training data. METHODS: In this work, we develop a machine learning algorithm for abbreviation definition identification in text which makes use of what we term naturally labeled data. Positive training examples are naturally occurring potential abbreviation-definition pairs in text. Negative training examples are generated by randomly mixing potential abbreviations with unrelated potential definitions. The machine learner is trained to distinguish between these two sets of examples. Then, the learned feature weights are used to identify the abbreviation full form. This approach does not require manually labeled training data. RESULTS: We evaluate the performance of our algorithm on the Ab3P, BIOADI and Medstract corpora. Our system demonstrated results that compare favourably to the existing Ab3P and BIOADI systems. We achieve an F-measure of 91.36% on Ab3P corpus, and an F-measure of 87.13% on BIOADI corpus which are superior to the results reported by Ab3P and BIOADI systems. Moreover, we outperform these systems in terms of recall, which is one of our goals.
format	Online Article Text
id	pubmed-3111592
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-31115922011-06-11 Machine learning with naturally labeled data for identifying abbreviation definitions Yeganova, Lana Comeau, Donald C Wilbur, W John BMC Bioinformatics Research BACKGROUND: The rapid growth of biomedical literature requires accurate text analysis and text processing tools. Detecting abbreviations and identifying their definitions is an important component of such tools. Most existing approaches for the abbreviation definition identification task employ rule-based methods. While achieving high precision, rule-based methods are limited to the rules defined and fail to capture many uncommon definition patterns. Supervised learning techniques, which offer more flexibility in detecting abbreviation definitions, have also been applied to the problem. However, they require manually labeled training data. METHODS: In this work, we develop a machine learning algorithm for abbreviation definition identification in text which makes use of what we term naturally labeled data. Positive training examples are naturally occurring potential abbreviation-definition pairs in text. Negative training examples are generated by randomly mixing potential abbreviations with unrelated potential definitions. The machine learner is trained to distinguish between these two sets of examples. Then, the learned feature weights are used to identify the abbreviation full form. This approach does not require manually labeled training data. RESULTS: We evaluate the performance of our algorithm on the Ab3P, BIOADI and Medstract corpora. Our system demonstrated results that compare favourably to the existing Ab3P and BIOADI systems. We achieve an F-measure of 91.36% on Ab3P corpus, and an F-measure of 87.13% on BIOADI corpus which are superior to the results reported by Ab3P and BIOADI systems. Moreover, we outperform these systems in terms of recall, which is one of our goals. BioMed Central 2011-06-09 /pmc/articles/PMC3111592/ /pubmed/21658293 http://dx.doi.org/10.1186/1471-2105-12-S3-S6 Text en This article is in the public domain. This article is in the public domain.
spellingShingle	Research Yeganova, Lana Comeau, Donald C Wilbur, W John Machine learning with naturally labeled data for identifying abbreviation definitions
title	Machine learning with naturally labeled data for identifying abbreviation definitions
title_full	Machine learning with naturally labeled data for identifying abbreviation definitions
title_fullStr	Machine learning with naturally labeled data for identifying abbreviation definitions
title_full_unstemmed	Machine learning with naturally labeled data for identifying abbreviation definitions
title_short	Machine learning with naturally labeled data for identifying abbreviation definitions
title_sort	machine learning with naturally labeled data for identifying abbreviation definitions
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3111592/ https://www.ncbi.nlm.nih.gov/pubmed/21658293 http://dx.doi.org/10.1186/1471-2105-12-S3-S6
work_keys_str_mv	AT yeganovalana machinelearningwithnaturallylabeleddataforidentifyingabbreviationdefinitions AT comeaudonaldc machinelearningwithnaturallylabeleddataforidentifyingabbreviationdefinitions AT wilburwjohn machinelearningwithnaturallylabeleddataforidentifyingabbreviationdefinitions

Machine learning with naturally labeled data for identifying abbreviation definitions

Ejemplares similares