Cargando…

Protein annotation as term categorization in the gene ontology using word proximity networks

BACKGROUND: We participated in the BioCreAtIvE Task 2, which addressed the annotation of proteins into the Gene Ontology (GO) based on the text of a given document and the selection of evidence text from the document justifying that annotation. We approached the task utilizing several combinations o...

Descripción completa

Detalles Bibliográficos
Autores principales: Verspoor, Karin, Cohn, Judith, Joslyn, Cliff, Mniszewski, Sue, Rechtsteiner, Andreas, Rocha, Luis M, Simas, Tiago
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1869013/
https://www.ncbi.nlm.nih.gov/pubmed/15960833
http://dx.doi.org/10.1186/1471-2105-6-S1-S20
_version_ 1782133427582009344
author Verspoor, Karin
Cohn, Judith
Joslyn, Cliff
Mniszewski, Sue
Rechtsteiner, Andreas
Rocha, Luis M
Simas, Tiago
author_facet Verspoor, Karin
Cohn, Judith
Joslyn, Cliff
Mniszewski, Sue
Rechtsteiner, Andreas
Rocha, Luis M
Simas, Tiago
author_sort Verspoor, Karin
collection PubMed
description BACKGROUND: We participated in the BioCreAtIvE Task 2, which addressed the annotation of proteins into the Gene Ontology (GO) based on the text of a given document and the selection of evidence text from the document justifying that annotation. We approached the task utilizing several combinations of two distinct methods: an unsupervised algorithm for expanding words associated with GO nodes, and an annotation methodology which treats annotation as categorization of terms from a protein's document neighborhood into the GO. RESULTS: The evaluation results indicate that the method for expanding words associated with GO nodes is quite powerful; we were able to successfully select appropriate evidence text for a given annotation in 38% of Task 2.1 queries by building on this method. The term categorization methodology achieved a precision of 16% for annotation within the correct extended family in Task 2.2, though we show through subsequent analysis that this can be improved with a different parameter setting. Our architecture proved not to be very successful on the evidence text component of the task, in the configuration used to generate the submitted results. CONCLUSION: The initial results show promise for both of the methods we explored, and we are planning to integrate the methods more closely to achieve better results overall.
format Text
id pubmed-1869013
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-18690132007-05-18 Protein annotation as term categorization in the gene ontology using word proximity networks Verspoor, Karin Cohn, Judith Joslyn, Cliff Mniszewski, Sue Rechtsteiner, Andreas Rocha, Luis M Simas, Tiago BMC Bioinformatics Report BACKGROUND: We participated in the BioCreAtIvE Task 2, which addressed the annotation of proteins into the Gene Ontology (GO) based on the text of a given document and the selection of evidence text from the document justifying that annotation. We approached the task utilizing several combinations of two distinct methods: an unsupervised algorithm for expanding words associated with GO nodes, and an annotation methodology which treats annotation as categorization of terms from a protein's document neighborhood into the GO. RESULTS: The evaluation results indicate that the method for expanding words associated with GO nodes is quite powerful; we were able to successfully select appropriate evidence text for a given annotation in 38% of Task 2.1 queries by building on this method. The term categorization methodology achieved a precision of 16% for annotation within the correct extended family in Task 2.2, though we show through subsequent analysis that this can be improved with a different parameter setting. Our architecture proved not to be very successful on the evidence text component of the task, in the configuration used to generate the submitted results. CONCLUSION: The initial results show promise for both of the methods we explored, and we are planning to integrate the methods more closely to achieve better results overall. BioMed Central 2005-05-24 /pmc/articles/PMC1869013/ /pubmed/15960833 http://dx.doi.org/10.1186/1471-2105-6-S1-S20 Text en Copyright © 2005 Verspoor et al; licensee BioMed Central Ltd http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Report
Verspoor, Karin
Cohn, Judith
Joslyn, Cliff
Mniszewski, Sue
Rechtsteiner, Andreas
Rocha, Luis M
Simas, Tiago
Protein annotation as term categorization in the gene ontology using word proximity networks
title Protein annotation as term categorization in the gene ontology using word proximity networks
title_full Protein annotation as term categorization in the gene ontology using word proximity networks
title_fullStr Protein annotation as term categorization in the gene ontology using word proximity networks
title_full_unstemmed Protein annotation as term categorization in the gene ontology using word proximity networks
title_short Protein annotation as term categorization in the gene ontology using word proximity networks
title_sort protein annotation as term categorization in the gene ontology using word proximity networks
topic Report
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1869013/
https://www.ncbi.nlm.nih.gov/pubmed/15960833
http://dx.doi.org/10.1186/1471-2105-6-S1-S20
work_keys_str_mv AT verspoorkarin proteinannotationastermcategorizationinthegeneontologyusingwordproximitynetworks
AT cohnjudith proteinannotationastermcategorizationinthegeneontologyusingwordproximitynetworks
AT joslyncliff proteinannotationastermcategorizationinthegeneontologyusingwordproximitynetworks
AT mniszewskisue proteinannotationastermcategorizationinthegeneontologyusingwordproximitynetworks
AT rechtsteinerandreas proteinannotationastermcategorizationinthegeneontologyusingwordproximitynetworks
AT rochaluism proteinannotationastermcategorizationinthegeneontologyusingwordproximitynetworks
AT simastiago proteinannotationastermcategorizationinthegeneontologyusingwordproximitynetworks