Cargando…

Overview of the BioCreative III Workshop

BACKGROUND: The overall goal of the BioCreative Workshops is to promote the development of text mining and text processing tools which are useful to the communities of researchers and database curators in the biological sciences. To this end BioCreative I was held in 2004, BioCreative II in 2007, an...

Descripción completa

Detalles Bibliográficos
Autores principales: Arighi, Cecilia N, Lu, Zhiyong, Krallinger, Martin, Cohen, Kevin B, Wilbur, W John, Valencia, Alfonso, Hirschman, Lynette, Wu, Cathy H
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3269932/
https://www.ncbi.nlm.nih.gov/pubmed/22151647
http://dx.doi.org/10.1186/1471-2105-12-S8-S1
_version_ 1782222521875038208
author Arighi, Cecilia N
Lu, Zhiyong
Krallinger, Martin
Cohen, Kevin B
Wilbur, W John
Valencia, Alfonso
Hirschman, Lynette
Wu, Cathy H
author_facet Arighi, Cecilia N
Lu, Zhiyong
Krallinger, Martin
Cohen, Kevin B
Wilbur, W John
Valencia, Alfonso
Hirschman, Lynette
Wu, Cathy H
author_sort Arighi, Cecilia N
collection PubMed
description BACKGROUND: The overall goal of the BioCreative Workshops is to promote the development of text mining and text processing tools which are useful to the communities of researchers and database curators in the biological sciences. To this end BioCreative I was held in 2004, BioCreative II in 2007, and BioCreative II.5 in 2009. Each of these workshops involved humanly annotated test data for several basic tasks in text mining applied to the biomedical literature. Participants in the workshops were invited to compete in the tasks by constructing software systems to perform the tasks automatically and were given scores based on their performance. The results of these workshops have benefited the community in several ways. They have 1) provided evidence for the most effective methods currently available to solve specific problems; 2) revealed the current state of the art for performance on those problems; 3) and provided gold standard data and results on that data by which future advances can be gauged. This special issue contains overview papers for the three tasks of BioCreative III. RESULTS: The BioCreative III Workshop was held in September of 2010 and continued the tradition of a challenge evaluation on several tasks judged basic to effective text mining in biology, including a gene normalization (GN) task and two protein-protein interaction (PPI) tasks. In total the Workshop involved the work of twenty-three teams. Thirteen teams participated in the GN task which required the assignment of EntrezGene IDs to all named genes in full text papers without any species information being provided to a system. Ten teams participated in the PPI article classification task (ACT) requiring a system to classify and rank a PubMed(®) record as belonging to an article either having or not having “PPI relevant” information. Eight teams participated in the PPI interaction method task (IMT) where systems were given full text documents and were required to extract the experimental methods used to establish PPIs and a text segment supporting each such method. Gold standard data was compiled for each of these tasks and participants competed in developing systems to perform the tasks automatically. BioCreative III also introduced a new interactive task (IAT), run as a demonstration task. The goal was to develop an interactive system to facilitate a user’s annotation of the unique database identifiers for all the genes appearing in an article. This task included ranking genes by importance (based preferably on the amount of described experimental information regarding genes). There was also an optional task to assist the user in finding the most relevant articles about a given gene. For BioCreative III, a user advisory group (UAG) was assembled and played an important role 1) in producing some of the gold standard annotations for the GN task, 2) in critiquing IAT systems, and 3) in providing guidance for a future more rigorous evaluation of IAT systems. Six teams participated in the IAT demonstration task and received feedback on their systems from the UAG group. Besides innovations in the GN and PPI tasks making them more realistic and practical and the introduction of the IAT task, discussions were begun on community data standards to promote interoperability and on user requirements and evaluation metrics to address utility and usability of systems. CONCLUSIONS: In this paper we give a brief history of the BioCreative Workshops and how they relate to other text mining competitions in biology. This is followed by a synopsis of the three tasks GN, PPI, and IAT in BioCreative III with figures for best participant performance on the GN and PPI tasks. These results are discussed and compared with results from previous BioCreative Workshops and we conclude that the best performing systems for GN, PPI-ACT and PPI-IMT in realistic settings are not sufficient for fully automatic use. This provides evidence for the importance of interactive systems and we present our vision of how best to construct an interactive system for a GN or PPI like task in the remainder of the paper.
format Online
Article
Text
id pubmed-3269932
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32699322012-02-02 Overview of the BioCreative III Workshop Arighi, Cecilia N Lu, Zhiyong Krallinger, Martin Cohen, Kevin B Wilbur, W John Valencia, Alfonso Hirschman, Lynette Wu, Cathy H BMC Bioinformatics Research BACKGROUND: The overall goal of the BioCreative Workshops is to promote the development of text mining and text processing tools which are useful to the communities of researchers and database curators in the biological sciences. To this end BioCreative I was held in 2004, BioCreative II in 2007, and BioCreative II.5 in 2009. Each of these workshops involved humanly annotated test data for several basic tasks in text mining applied to the biomedical literature. Participants in the workshops were invited to compete in the tasks by constructing software systems to perform the tasks automatically and were given scores based on their performance. The results of these workshops have benefited the community in several ways. They have 1) provided evidence for the most effective methods currently available to solve specific problems; 2) revealed the current state of the art for performance on those problems; 3) and provided gold standard data and results on that data by which future advances can be gauged. This special issue contains overview papers for the three tasks of BioCreative III. RESULTS: The BioCreative III Workshop was held in September of 2010 and continued the tradition of a challenge evaluation on several tasks judged basic to effective text mining in biology, including a gene normalization (GN) task and two protein-protein interaction (PPI) tasks. In total the Workshop involved the work of twenty-three teams. Thirteen teams participated in the GN task which required the assignment of EntrezGene IDs to all named genes in full text papers without any species information being provided to a system. Ten teams participated in the PPI article classification task (ACT) requiring a system to classify and rank a PubMed(®) record as belonging to an article either having or not having “PPI relevant” information. Eight teams participated in the PPI interaction method task (IMT) where systems were given full text documents and were required to extract the experimental methods used to establish PPIs and a text segment supporting each such method. Gold standard data was compiled for each of these tasks and participants competed in developing systems to perform the tasks automatically. BioCreative III also introduced a new interactive task (IAT), run as a demonstration task. The goal was to develop an interactive system to facilitate a user’s annotation of the unique database identifiers for all the genes appearing in an article. This task included ranking genes by importance (based preferably on the amount of described experimental information regarding genes). There was also an optional task to assist the user in finding the most relevant articles about a given gene. For BioCreative III, a user advisory group (UAG) was assembled and played an important role 1) in producing some of the gold standard annotations for the GN task, 2) in critiquing IAT systems, and 3) in providing guidance for a future more rigorous evaluation of IAT systems. Six teams participated in the IAT demonstration task and received feedback on their systems from the UAG group. Besides innovations in the GN and PPI tasks making them more realistic and practical and the introduction of the IAT task, discussions were begun on community data standards to promote interoperability and on user requirements and evaluation metrics to address utility and usability of systems. CONCLUSIONS: In this paper we give a brief history of the BioCreative Workshops and how they relate to other text mining competitions in biology. This is followed by a synopsis of the three tasks GN, PPI, and IAT in BioCreative III with figures for best participant performance on the GN and PPI tasks. These results are discussed and compared with results from previous BioCreative Workshops and we conclude that the best performing systems for GN, PPI-ACT and PPI-IMT in realistic settings are not sufficient for fully automatic use. This provides evidence for the importance of interactive systems and we present our vision of how best to construct an interactive system for a GN or PPI like task in the remainder of the paper. BioMed Central 2011-10-03 /pmc/articles/PMC3269932/ /pubmed/22151647 http://dx.doi.org/10.1186/1471-2105-12-S8-S1 Text en Copyright ©2011 Arighi et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Arighi, Cecilia N
Lu, Zhiyong
Krallinger, Martin
Cohen, Kevin B
Wilbur, W John
Valencia, Alfonso
Hirschman, Lynette
Wu, Cathy H
Overview of the BioCreative III Workshop
title Overview of the BioCreative III Workshop
title_full Overview of the BioCreative III Workshop
title_fullStr Overview of the BioCreative III Workshop
title_full_unstemmed Overview of the BioCreative III Workshop
title_short Overview of the BioCreative III Workshop
title_sort overview of the biocreative iii workshop
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3269932/
https://www.ncbi.nlm.nih.gov/pubmed/22151647
http://dx.doi.org/10.1186/1471-2105-12-S8-S1
work_keys_str_mv AT arighicecilian overviewofthebiocreativeiiiworkshop
AT luzhiyong overviewofthebiocreativeiiiworkshop
AT krallingermartin overviewofthebiocreativeiiiworkshop
AT cohenkevinb overviewofthebiocreativeiiiworkshop
AT wilburwjohn overviewofthebiocreativeiiiworkshop
AT valenciaalfonso overviewofthebiocreativeiiiworkshop
AT hirschmanlynette overviewofthebiocreativeiiiworkshop
AT wucathyh overviewofthebiocreativeiiiworkshop