Cargando…

An open-source framework for large-scale, flexible evaluation of biomedical text mining systems

BACKGROUND: Improved evaluation methodologies have been identified as a necessary prerequisite to the improvement of text mining theory and practice. This paper presents a publicly available framework that facilitates thorough, structured, and large-scale evaluations of text mining technologies. The...

Descripción completa

Detalles Bibliográficos
Autores principales:	Baumgartner, William A, Cohen, K Bretonnel, Hunter, Lawrence
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2276192/ https://www.ncbi.nlm.nih.gov/pubmed/18230184 http://dx.doi.org/10.1186/1747-5333-3-1

_version_	1782151975047004160
author	Baumgartner, William A Cohen, K Bretonnel Hunter, Lawrence
author_facet	Baumgartner, William A Cohen, K Bretonnel Hunter, Lawrence
author_sort	Baumgartner, William A
collection	PubMed
description	BACKGROUND: Improved evaluation methodologies have been identified as a necessary prerequisite to the improvement of text mining theory and practice. This paper presents a publicly available framework that facilitates thorough, structured, and large-scale evaluations of text mining technologies. The extensibility of this framework and its ability to uncover system-wide characteristics by analyzing component parts as well as its usefulness for facilitating third-party application integration are demonstrated through examples in the biomedical domain. RESULTS: Our evaluation framework was assembled using the Unstructured Information Management Architecture. It was used to analyze a set of gene mention identification systems involving 225 combinations of system, evaluation corpus, and correctness measure. Interactions between all three were found to affect the relative rankings of the systems. A second experiment evaluated gene normalization system performance using as input 4,097 combinations of gene mention systems and gene mention system-combining strategies. Gene mention system recall is shown to affect gene normalization system performance much more than does gene mention system precision, and high gene normalization performance is shown to be achievable with remarkably low levels of gene mention system precision. CONCLUSION: The software presented in this paper demonstrates the potential for novel discovery resulting from the structured evaluation of biomedical language processing systems, as well as the usefulness of such an evaluation framework for promoting collaboration between developers of biomedical language processing technologies. The code base is available as part of the BioNLP UIMA Component Repository on SourceForge.net.
format	Text
id	pubmed-2276192
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-22761922008-03-28 An open-source framework for large-scale, flexible evaluation of biomedical text mining systems Baumgartner, William A Cohen, K Bretonnel Hunter, Lawrence J Biomed Discov Collab Software BACKGROUND: Improved evaluation methodologies have been identified as a necessary prerequisite to the improvement of text mining theory and practice. This paper presents a publicly available framework that facilitates thorough, structured, and large-scale evaluations of text mining technologies. The extensibility of this framework and its ability to uncover system-wide characteristics by analyzing component parts as well as its usefulness for facilitating third-party application integration are demonstrated through examples in the biomedical domain. RESULTS: Our evaluation framework was assembled using the Unstructured Information Management Architecture. It was used to analyze a set of gene mention identification systems involving 225 combinations of system, evaluation corpus, and correctness measure. Interactions between all three were found to affect the relative rankings of the systems. A second experiment evaluated gene normalization system performance using as input 4,097 combinations of gene mention systems and gene mention system-combining strategies. Gene mention system recall is shown to affect gene normalization system performance much more than does gene mention system precision, and high gene normalization performance is shown to be achievable with remarkably low levels of gene mention system precision. CONCLUSION: The software presented in this paper demonstrates the potential for novel discovery resulting from the structured evaluation of biomedical language processing systems, as well as the usefulness of such an evaluation framework for promoting collaboration between developers of biomedical language processing technologies. The code base is available as part of the BioNLP UIMA Component Repository on SourceForge.net. BioMed Central 2008-01-29 /pmc/articles/PMC2276192/ /pubmed/18230184 http://dx.doi.org/10.1186/1747-5333-3-1 Text en Copyright © 2008 Baumgartner et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Software Baumgartner, William A Cohen, K Bretonnel Hunter, Lawrence An open-source framework for large-scale, flexible evaluation of biomedical text mining systems
title	An open-source framework for large-scale, flexible evaluation of biomedical text mining systems
title_full	An open-source framework for large-scale, flexible evaluation of biomedical text mining systems
title_fullStr	An open-source framework for large-scale, flexible evaluation of biomedical text mining systems
title_full_unstemmed	An open-source framework for large-scale, flexible evaluation of biomedical text mining systems
title_short	An open-source framework for large-scale, flexible evaluation of biomedical text mining systems
title_sort	open-source framework for large-scale, flexible evaluation of biomedical text mining systems
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2276192/ https://www.ncbi.nlm.nih.gov/pubmed/18230184 http://dx.doi.org/10.1186/1747-5333-3-1
work_keys_str_mv	AT baumgartnerwilliama anopensourceframeworkforlargescaleflexibleevaluationofbiomedicaltextminingsystems AT cohenkbretonnel anopensourceframeworkforlargescaleflexibleevaluationofbiomedicaltextminingsystems AT hunterlawrence anopensourceframeworkforlargescaleflexibleevaluationofbiomedicaltextminingsystems AT baumgartnerwilliama opensourceframeworkforlargescaleflexibleevaluationofbiomedicaltextminingsystems AT cohenkbretonnel opensourceframeworkforlargescaleflexibleevaluationofbiomedicaltextminingsystems AT hunterlawrence opensourceframeworkforlargescaleflexibleevaluationofbiomedicaltextminingsystems

An open-source framework for large-scale, flexible evaluation of biomedical text mining systems

Ejemplares similares