Cargando…

Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge

BACKGROUND: Genome sciences have experienced an increasing demand for efficient text-processing tools that can extract biologically relevant information from the growing amount of published literature. In response, a range of text-mining and information-extraction tools have recently been developed...

Descripción completa

Detalles Bibliográficos
Autores principales:	Krallinger, Martin, Morgan, Alexander, Smith, Larry, Leitner, Florian, Tanabe, Lorraine, Wilbur, John, Hirschman, Lynette, Valencia, Alfonso
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559980/ https://www.ncbi.nlm.nih.gov/pubmed/18834487 http://dx.doi.org/10.1186/gb-2008-9-s2-s1

_version_	1782159690799513600
author	Krallinger, Martin Morgan, Alexander Smith, Larry Leitner, Florian Tanabe, Lorraine Wilbur, John Hirschman, Lynette Valencia, Alfonso
author_facet	Krallinger, Martin Morgan, Alexander Smith, Larry Leitner, Florian Tanabe, Lorraine Wilbur, John Hirschman, Lynette Valencia, Alfonso
author_sort	Krallinger, Martin
collection	PubMed
description	BACKGROUND: Genome sciences have experienced an increasing demand for efficient text-processing tools that can extract biologically relevant information from the growing amount of published literature. In response, a range of text-mining and information-extraction tools have recently been developed specifically for the biological domain. Such tools are only useful if they are designed to meet real-life tasks and if their performance can be estimated and compared. The BioCreative challenge (Critical Assessment of Information Extraction in Biology) consists of a collaborative initiative to provide a common evaluation framework for monitoring and assessing the state-of-the-art of text-mining systems applied to biologically relevant problems. RESULTS: The Second BioCreative assessment (2006 to 2007) attracted 44 teams from 13 countries worldwide, with the aim of evaluating current information-extraction/text-mining technologies developed for one or more of the three tasks defined for this challenge evaluation. These tasks included the recognition of gene mentions in abstracts (gene mention task); the extraction of a list of unique identifiers for human genes mentioned in abstracts (gene normalization task); and finally the extraction of physical protein-protein interaction annotation-relevant information (protein-protein interaction task). The 'gold standard' data used for evaluating submissions for the third task was provided by the interaction databases MINT (Molecular Interaction Database) and IntAct. CONCLUSION: The Second BioCreative assessment almost doubled the number of participants for each individual task when compared with the first BioCreative assessment. An overall improvement in terms of balanced precision and recall was observed for the best submissions for the gene mention (F score 0.87); for the gene normalization task, the best results were comparable (F score 0.81) compared with results obtained for similar tasks posed at the first BioCreative challenge. In case of the protein-protein interaction task, the importance and difficulties of experimentally confirmed annotation extraction from full-text articles were explored, yielding different results depending on the step of the annotation extraction workflow. A common characteristic observed in all three tasks was that the combination of system outputs could yield better results than any single system. Finally, the development of the first text-mining meta-server was promoted within the context of this community challenge.
format	Text
id	pubmed-2559980
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-25599802008-10-04 Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge Krallinger, Martin Morgan, Alexander Smith, Larry Leitner, Florian Tanabe, Lorraine Wilbur, John Hirschman, Lynette Valencia, Alfonso Genome Biol Research BACKGROUND: Genome sciences have experienced an increasing demand for efficient text-processing tools that can extract biologically relevant information from the growing amount of published literature. In response, a range of text-mining and information-extraction tools have recently been developed specifically for the biological domain. Such tools are only useful if they are designed to meet real-life tasks and if their performance can be estimated and compared. The BioCreative challenge (Critical Assessment of Information Extraction in Biology) consists of a collaborative initiative to provide a common evaluation framework for monitoring and assessing the state-of-the-art of text-mining systems applied to biologically relevant problems. RESULTS: The Second BioCreative assessment (2006 to 2007) attracted 44 teams from 13 countries worldwide, with the aim of evaluating current information-extraction/text-mining technologies developed for one or more of the three tasks defined for this challenge evaluation. These tasks included the recognition of gene mentions in abstracts (gene mention task); the extraction of a list of unique identifiers for human genes mentioned in abstracts (gene normalization task); and finally the extraction of physical protein-protein interaction annotation-relevant information (protein-protein interaction task). The 'gold standard' data used for evaluating submissions for the third task was provided by the interaction databases MINT (Molecular Interaction Database) and IntAct. CONCLUSION: The Second BioCreative assessment almost doubled the number of participants for each individual task when compared with the first BioCreative assessment. An overall improvement in terms of balanced precision and recall was observed for the best submissions for the gene mention (F score 0.87); for the gene normalization task, the best results were comparable (F score 0.81) compared with results obtained for similar tasks posed at the first BioCreative challenge. In case of the protein-protein interaction task, the importance and difficulties of experimentally confirmed annotation extraction from full-text articles were explored, yielding different results depending on the step of the annotation extraction workflow. A common characteristic observed in all three tasks was that the combination of system outputs could yield better results than any single system. Finally, the development of the first text-mining meta-server was promoted within the context of this community challenge. BioMed Central 2008 2008-09-01 /pmc/articles/PMC2559980/ /pubmed/18834487 http://dx.doi.org/10.1186/gb-2008-9-s2-s1 Text en Copyright © 2008 Krallinger et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Krallinger, Martin Morgan, Alexander Smith, Larry Leitner, Florian Tanabe, Lorraine Wilbur, John Hirschman, Lynette Valencia, Alfonso Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge
title	Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge
title_full	Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge
title_fullStr	Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge
title_full_unstemmed	Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge
title_short	Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge
title_sort	evaluation of text-mining systems for biology: overview of the second biocreative community challenge
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559980/ https://www.ncbi.nlm.nih.gov/pubmed/18834487 http://dx.doi.org/10.1186/gb-2008-9-s2-s1
work_keys_str_mv	AT krallingermartin evaluationoftextminingsystemsforbiologyoverviewofthesecondbiocreativecommunitychallenge AT morganalexander evaluationoftextminingsystemsforbiologyoverviewofthesecondbiocreativecommunitychallenge AT smithlarry evaluationoftextminingsystemsforbiologyoverviewofthesecondbiocreativecommunitychallenge AT leitnerflorian evaluationoftextminingsystemsforbiologyoverviewofthesecondbiocreativecommunitychallenge AT tanabelorraine evaluationoftextminingsystemsforbiologyoverviewofthesecondbiocreativecommunitychallenge AT wilburjohn evaluationoftextminingsystemsforbiologyoverviewofthesecondbiocreativecommunitychallenge AT hirschmanlynette evaluationoftextminingsystemsforbiologyoverviewofthesecondbiocreativecommunitychallenge AT valenciaalfonso evaluationoftextminingsystemsforbiologyoverviewofthesecondbiocreativecommunitychallenge

Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge

Ejemplares similares