Cargando…

Evaluation of BioCreAtIvE assessment of task 2

BACKGROUND: Molecular Biology accumulated substantial amounts of data concerning functions of genes and proteins. Information relating to functional descriptions is generally extracted manually from textual data and stored in biological databases to build up annotations for large collections of gene...

Descripción completa

Detalles Bibliográficos
Autores principales: Blaschke, Christian, Leon, Eduardo Andres, Krallinger, Martin, Valencia, Alfonso
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1869008/
https://www.ncbi.nlm.nih.gov/pubmed/15960828
http://dx.doi.org/10.1186/1471-2105-6-S1-S16
_version_ 1782133426076254208
author Blaschke, Christian
Leon, Eduardo Andres
Krallinger, Martin
Valencia, Alfonso
author_facet Blaschke, Christian
Leon, Eduardo Andres
Krallinger, Martin
Valencia, Alfonso
author_sort Blaschke, Christian
collection PubMed
description BACKGROUND: Molecular Biology accumulated substantial amounts of data concerning functions of genes and proteins. Information relating to functional descriptions is generally extracted manually from textual data and stored in biological databases to build up annotations for large collections of gene products. Those annotation databases are crucial for the interpretation of large scale analysis approaches using bioinformatics or experimental techniques. Due to the growing accumulation of functional descriptions in biomedical literature the need for text mining tools to facilitate the extraction of such annotations is urgent. In order to make text mining tools useable in real world scenarios, for instance to assist database curators during annotation of protein function, comparisons and evaluations of different approaches on full text articles are needed. RESULTS: The Critical Assessment for Information Extraction in Biology (BioCreAtIvE) contest consists of a community wide competition aiming to evaluate different strategies for text mining tools, as applied to biomedical literature. We report on task two which addressed the automatic extraction and assignment of Gene Ontology (GO) annotations of human proteins, using full text articles. The predictions of task 2 are based on triplets of protein – GO term – article passage. The annotation-relevant text passages were returned by the participants and evaluated by expert curators of the GO annotation (GOA) team at the European Institute of Bioinformatics (EBI). Each participant could submit up to three results for each sub-task comprising task 2. In total more than 15,000 individual results were provided by the participants. The curators evaluated in addition to the annotation itself, whether the protein and the GO term were correctly predicted and traceable through the submitted text fragment. CONCLUSION: Concepts provided by GO are currently the most extended set of terms used for annotating gene products, thus they were explored to assess how effectively text mining tools are able to extract those annotations automatically. Although the obtained results are promising, they are still far from reaching the required performance demanded by real world applications. Among the principal difficulties encountered to address the proposed task, were the complex nature of the GO terms and protein names (the large range of variants which are used to express proteins and especially GO terms in free text), and the lack of a standard training set. A range of very different strategies were used to tackle this task. The dataset generated in line with the BioCreative challenge is publicly available and will allow new possibilities for training information extraction methods in the domain of molecular biology.
format Text
id pubmed-1869008
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-18690082007-05-18 Evaluation of BioCreAtIvE assessment of task 2 Blaschke, Christian Leon, Eduardo Andres Krallinger, Martin Valencia, Alfonso BMC Bioinformatics Report BACKGROUND: Molecular Biology accumulated substantial amounts of data concerning functions of genes and proteins. Information relating to functional descriptions is generally extracted manually from textual data and stored in biological databases to build up annotations for large collections of gene products. Those annotation databases are crucial for the interpretation of large scale analysis approaches using bioinformatics or experimental techniques. Due to the growing accumulation of functional descriptions in biomedical literature the need for text mining tools to facilitate the extraction of such annotations is urgent. In order to make text mining tools useable in real world scenarios, for instance to assist database curators during annotation of protein function, comparisons and evaluations of different approaches on full text articles are needed. RESULTS: The Critical Assessment for Information Extraction in Biology (BioCreAtIvE) contest consists of a community wide competition aiming to evaluate different strategies for text mining tools, as applied to biomedical literature. We report on task two which addressed the automatic extraction and assignment of Gene Ontology (GO) annotations of human proteins, using full text articles. The predictions of task 2 are based on triplets of protein – GO term – article passage. The annotation-relevant text passages were returned by the participants and evaluated by expert curators of the GO annotation (GOA) team at the European Institute of Bioinformatics (EBI). Each participant could submit up to three results for each sub-task comprising task 2. In total more than 15,000 individual results were provided by the participants. The curators evaluated in addition to the annotation itself, whether the protein and the GO term were correctly predicted and traceable through the submitted text fragment. CONCLUSION: Concepts provided by GO are currently the most extended set of terms used for annotating gene products, thus they were explored to assess how effectively text mining tools are able to extract those annotations automatically. Although the obtained results are promising, they are still far from reaching the required performance demanded by real world applications. Among the principal difficulties encountered to address the proposed task, were the complex nature of the GO terms and protein names (the large range of variants which are used to express proteins and especially GO terms in free text), and the lack of a standard training set. A range of very different strategies were used to tackle this task. The dataset generated in line with the BioCreative challenge is publicly available and will allow new possibilities for training information extraction methods in the domain of molecular biology. BioMed Central 2005-05-24 /pmc/articles/PMC1869008/ /pubmed/15960828 http://dx.doi.org/10.1186/1471-2105-6-S1-S16 Text en Copyright © 2005 Blaschke et al; licensee BioMed Central Ltd http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Report
Blaschke, Christian
Leon, Eduardo Andres
Krallinger, Martin
Valencia, Alfonso
Evaluation of BioCreAtIvE assessment of task 2
title Evaluation of BioCreAtIvE assessment of task 2
title_full Evaluation of BioCreAtIvE assessment of task 2
title_fullStr Evaluation of BioCreAtIvE assessment of task 2
title_full_unstemmed Evaluation of BioCreAtIvE assessment of task 2
title_short Evaluation of BioCreAtIvE assessment of task 2
title_sort evaluation of biocreative assessment of task 2
topic Report
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1869008/
https://www.ncbi.nlm.nih.gov/pubmed/15960828
http://dx.doi.org/10.1186/1471-2105-6-S1-S16
work_keys_str_mv AT blaschkechristian evaluationofbiocreativeassessmentoftask2
AT leoneduardoandres evaluationofbiocreativeassessmentoftask2
AT krallingermartin evaluationofbiocreativeassessmentoftask2
AT valenciaalfonso evaluationofbiocreativeassessmentoftask2