Cargando…

An evaluation of GO annotation retrieval for BioCreAtIvE and GOA

BACKGROUND: The Gene Ontology Annotation (GOA) database aims to provide high-quality supplementary GO annotation to proteins in the UniProt Knowledgebase. Like many other biological databases, GOA gathers much of its content from the careful manual curation of literature. However, as both the volume...

Descripción completa

Detalles Bibliográficos
Autores principales: Camon, Evelyn B, Barrell, Daniel G, Dimmer, Emily C, Lee, Vivian, Magrane, Michele, Maslen, John, Binns, David, Apweiler, Rolf
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1869009/
https://www.ncbi.nlm.nih.gov/pubmed/15960829
http://dx.doi.org/10.1186/1471-2105-6-S1-S17
_version_ 1782133426403409920
author Camon, Evelyn B
Barrell, Daniel G
Dimmer, Emily C
Lee, Vivian
Magrane, Michele
Maslen, John
Binns, David
Apweiler, Rolf
author_facet Camon, Evelyn B
Barrell, Daniel G
Dimmer, Emily C
Lee, Vivian
Magrane, Michele
Maslen, John
Binns, David
Apweiler, Rolf
author_sort Camon, Evelyn B
collection PubMed
description BACKGROUND: The Gene Ontology Annotation (GOA) database aims to provide high-quality supplementary GO annotation to proteins in the UniProt Knowledgebase. Like many other biological databases, GOA gathers much of its content from the careful manual curation of literature. However, as both the volume of literature and of proteins requiring characterization increases, the manual processing capability can become overloaded. Consequently, semi-automated aids are often employed to expedite the curation process. Traditionally, electronic techniques in GOA depend largely on exploiting the knowledge in existing resources such as InterPro. However, in recent years, text mining has been hailed as a potentially useful tool to aid the curation process. To encourage the development of such tools, the GOA team at EBI agreed to take part in the functional annotation task of the BioCreAtIvE (Critical Assessment of Information Extraction systems in Biology) challenge. BioCreAtIvE task 2 was an experiment to test if automatically derived classification using information retrieval and extraction could assist expert biologists in the annotation of the GO vocabulary to the proteins in the UniProt Knowledgebase. GOA provided the training corpus of over 9000 manual GO annotations extracted from the literature. For the test set, we provided a corpus of 200 new Journal of Biological Chemistry articles used to annotate 286 human proteins with GO terms. A team of experts manually evaluated the results of 9 participating groups, each of which provided highlighted sentences to support their GO and protein annotation predictions. Here, we give a biological perspective on the evaluation, explain how we annotate GO using literature and offer some suggestions to improve the precision of future text-retrieval and extraction techniques. Finally, we provide the results of the first inter-annotator agreement study for manual GO curation, as well as an assessment of our current electronic GO annotation strategies. RESULTS: The GOA database currently extracts GO annotation from the literature with 91 to 100% precision, and at least 72% recall. This creates a particularly high threshold for text mining systems which in BioCreAtIvE task 2 (GO annotation extraction and retrieval) initial results precisely predicted GO terms only 10 to 20% of the time. CONCLUSION: Improvements in the performance and accuracy of text mining for GO terms should be expected in the next BioCreAtIvE challenge. In the meantime the manual and electronic GO annotation strategies already employed by GOA will provide high quality annotations.
format Text
id pubmed-1869009
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-18690092007-05-18 An evaluation of GO annotation retrieval for BioCreAtIvE and GOA Camon, Evelyn B Barrell, Daniel G Dimmer, Emily C Lee, Vivian Magrane, Michele Maslen, John Binns, David Apweiler, Rolf BMC Bioinformatics Report BACKGROUND: The Gene Ontology Annotation (GOA) database aims to provide high-quality supplementary GO annotation to proteins in the UniProt Knowledgebase. Like many other biological databases, GOA gathers much of its content from the careful manual curation of literature. However, as both the volume of literature and of proteins requiring characterization increases, the manual processing capability can become overloaded. Consequently, semi-automated aids are often employed to expedite the curation process. Traditionally, electronic techniques in GOA depend largely on exploiting the knowledge in existing resources such as InterPro. However, in recent years, text mining has been hailed as a potentially useful tool to aid the curation process. To encourage the development of such tools, the GOA team at EBI agreed to take part in the functional annotation task of the BioCreAtIvE (Critical Assessment of Information Extraction systems in Biology) challenge. BioCreAtIvE task 2 was an experiment to test if automatically derived classification using information retrieval and extraction could assist expert biologists in the annotation of the GO vocabulary to the proteins in the UniProt Knowledgebase. GOA provided the training corpus of over 9000 manual GO annotations extracted from the literature. For the test set, we provided a corpus of 200 new Journal of Biological Chemistry articles used to annotate 286 human proteins with GO terms. A team of experts manually evaluated the results of 9 participating groups, each of which provided highlighted sentences to support their GO and protein annotation predictions. Here, we give a biological perspective on the evaluation, explain how we annotate GO using literature and offer some suggestions to improve the precision of future text-retrieval and extraction techniques. Finally, we provide the results of the first inter-annotator agreement study for manual GO curation, as well as an assessment of our current electronic GO annotation strategies. RESULTS: The GOA database currently extracts GO annotation from the literature with 91 to 100% precision, and at least 72% recall. This creates a particularly high threshold for text mining systems which in BioCreAtIvE task 2 (GO annotation extraction and retrieval) initial results precisely predicted GO terms only 10 to 20% of the time. CONCLUSION: Improvements in the performance and accuracy of text mining for GO terms should be expected in the next BioCreAtIvE challenge. In the meantime the manual and electronic GO annotation strategies already employed by GOA will provide high quality annotations. BioMed Central 2005-05-24 /pmc/articles/PMC1869009/ /pubmed/15960829 http://dx.doi.org/10.1186/1471-2105-6-S1-S17 Text en Copyright © 2005 Camon et al; licensee BioMed Central Ltd http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Report
Camon, Evelyn B
Barrell, Daniel G
Dimmer, Emily C
Lee, Vivian
Magrane, Michele
Maslen, John
Binns, David
Apweiler, Rolf
An evaluation of GO annotation retrieval for BioCreAtIvE and GOA
title An evaluation of GO annotation retrieval for BioCreAtIvE and GOA
title_full An evaluation of GO annotation retrieval for BioCreAtIvE and GOA
title_fullStr An evaluation of GO annotation retrieval for BioCreAtIvE and GOA
title_full_unstemmed An evaluation of GO annotation retrieval for BioCreAtIvE and GOA
title_short An evaluation of GO annotation retrieval for BioCreAtIvE and GOA
title_sort evaluation of go annotation retrieval for biocreative and goa
topic Report
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1869009/
https://www.ncbi.nlm.nih.gov/pubmed/15960829
http://dx.doi.org/10.1186/1471-2105-6-S1-S17
work_keys_str_mv AT camonevelynb anevaluationofgoannotationretrievalforbiocreativeandgoa
AT barrelldanielg anevaluationofgoannotationretrievalforbiocreativeandgoa
AT dimmeremilyc anevaluationofgoannotationretrievalforbiocreativeandgoa
AT leevivian anevaluationofgoannotationretrievalforbiocreativeandgoa
AT magranemichele anevaluationofgoannotationretrievalforbiocreativeandgoa
AT maslenjohn anevaluationofgoannotationretrievalforbiocreativeandgoa
AT binnsdavid anevaluationofgoannotationretrievalforbiocreativeandgoa
AT apweilerrolf anevaluationofgoannotationretrievalforbiocreativeandgoa
AT camonevelynb evaluationofgoannotationretrievalforbiocreativeandgoa
AT barrelldanielg evaluationofgoannotationretrievalforbiocreativeandgoa
AT dimmeremilyc evaluationofgoannotationretrievalforbiocreativeandgoa
AT leevivian evaluationofgoannotationretrievalforbiocreativeandgoa
AT magranemichele evaluationofgoannotationretrievalforbiocreativeandgoa
AT maslenjohn evaluationofgoannotationretrievalforbiocreativeandgoa
AT binnsdavid evaluationofgoannotationretrievalforbiocreativeandgoa
AT apweilerrolf evaluationofgoannotationretrievalforbiocreativeandgoa