Cargando…
Protein annotation as term categorization in the gene ontology using word proximity networks
BACKGROUND: We participated in the BioCreAtIvE Task 2, which addressed the annotation of proteins into the Gene Ontology (GO) based on the text of a given document and the selection of evidence text from the document justifying that annotation. We approached the task utilizing several combinations o...
Autores principales: | , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2005
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1869013/ https://www.ncbi.nlm.nih.gov/pubmed/15960833 http://dx.doi.org/10.1186/1471-2105-6-S1-S20 |
_version_ | 1782133427582009344 |
---|---|
author | Verspoor, Karin Cohn, Judith Joslyn, Cliff Mniszewski, Sue Rechtsteiner, Andreas Rocha, Luis M Simas, Tiago |
author_facet | Verspoor, Karin Cohn, Judith Joslyn, Cliff Mniszewski, Sue Rechtsteiner, Andreas Rocha, Luis M Simas, Tiago |
author_sort | Verspoor, Karin |
collection | PubMed |
description | BACKGROUND: We participated in the BioCreAtIvE Task 2, which addressed the annotation of proteins into the Gene Ontology (GO) based on the text of a given document and the selection of evidence text from the document justifying that annotation. We approached the task utilizing several combinations of two distinct methods: an unsupervised algorithm for expanding words associated with GO nodes, and an annotation methodology which treats annotation as categorization of terms from a protein's document neighborhood into the GO. RESULTS: The evaluation results indicate that the method for expanding words associated with GO nodes is quite powerful; we were able to successfully select appropriate evidence text for a given annotation in 38% of Task 2.1 queries by building on this method. The term categorization methodology achieved a precision of 16% for annotation within the correct extended family in Task 2.2, though we show through subsequent analysis that this can be improved with a different parameter setting. Our architecture proved not to be very successful on the evidence text component of the task, in the configuration used to generate the submitted results. CONCLUSION: The initial results show promise for both of the methods we explored, and we are planning to integrate the methods more closely to achieve better results overall. |
format | Text |
id | pubmed-1869013 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2005 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-18690132007-05-18 Protein annotation as term categorization in the gene ontology using word proximity networks Verspoor, Karin Cohn, Judith Joslyn, Cliff Mniszewski, Sue Rechtsteiner, Andreas Rocha, Luis M Simas, Tiago BMC Bioinformatics Report BACKGROUND: We participated in the BioCreAtIvE Task 2, which addressed the annotation of proteins into the Gene Ontology (GO) based on the text of a given document and the selection of evidence text from the document justifying that annotation. We approached the task utilizing several combinations of two distinct methods: an unsupervised algorithm for expanding words associated with GO nodes, and an annotation methodology which treats annotation as categorization of terms from a protein's document neighborhood into the GO. RESULTS: The evaluation results indicate that the method for expanding words associated with GO nodes is quite powerful; we were able to successfully select appropriate evidence text for a given annotation in 38% of Task 2.1 queries by building on this method. The term categorization methodology achieved a precision of 16% for annotation within the correct extended family in Task 2.2, though we show through subsequent analysis that this can be improved with a different parameter setting. Our architecture proved not to be very successful on the evidence text component of the task, in the configuration used to generate the submitted results. CONCLUSION: The initial results show promise for both of the methods we explored, and we are planning to integrate the methods more closely to achieve better results overall. BioMed Central 2005-05-24 /pmc/articles/PMC1869013/ /pubmed/15960833 http://dx.doi.org/10.1186/1471-2105-6-S1-S20 Text en Copyright © 2005 Verspoor et al; licensee BioMed Central Ltd http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Report Verspoor, Karin Cohn, Judith Joslyn, Cliff Mniszewski, Sue Rechtsteiner, Andreas Rocha, Luis M Simas, Tiago Protein annotation as term categorization in the gene ontology using word proximity networks |
title | Protein annotation as term categorization in the gene ontology using word proximity networks |
title_full | Protein annotation as term categorization in the gene ontology using word proximity networks |
title_fullStr | Protein annotation as term categorization in the gene ontology using word proximity networks |
title_full_unstemmed | Protein annotation as term categorization in the gene ontology using word proximity networks |
title_short | Protein annotation as term categorization in the gene ontology using word proximity networks |
title_sort | protein annotation as term categorization in the gene ontology using word proximity networks |
topic | Report |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1869013/ https://www.ncbi.nlm.nih.gov/pubmed/15960833 http://dx.doi.org/10.1186/1471-2105-6-S1-S20 |
work_keys_str_mv | AT verspoorkarin proteinannotationastermcategorizationinthegeneontologyusingwordproximitynetworks AT cohnjudith proteinannotationastermcategorizationinthegeneontologyusingwordproximitynetworks AT joslyncliff proteinannotationastermcategorizationinthegeneontologyusingwordproximitynetworks AT mniszewskisue proteinannotationastermcategorizationinthegeneontologyusingwordproximitynetworks AT rechtsteinerandreas proteinannotationastermcategorizationinthegeneontologyusingwordproximitynetworks AT rochaluism proteinannotationastermcategorizationinthegeneontologyusingwordproximitynetworks AT simastiago proteinannotationastermcategorizationinthegeneontologyusingwordproximitynetworks |