Cargando…

Automatic annotation of protein motif function with Gene Ontology terms

BACKGROUND: Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, amuch neede...

Descripción completa

Detalles Bibliográficos
Autores principales: Lu, Xinghua, Zhai, Chengxiang, Gopalakrishnan, Vanathi, Buchanan, Bruce G
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2004
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC517493/
https://www.ncbi.nlm.nih.gov/pubmed/15345032
http://dx.doi.org/10.1186/1471-2105-5-122
_version_ 1782121778411208704
author Lu, Xinghua
Zhai, Chengxiang
Gopalakrishnan, Vanathi
Buchanan, Bruce G
author_facet Lu, Xinghua
Zhai, Chengxiang
Gopalakrishnan, Vanathi
Buchanan, Bruce G
author_sort Lu, Xinghua
collection PubMed
description BACKGROUND: Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, amuch needed and importanttask is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO) project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. RESULTS: This paperpresents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifsis viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and aGO term association isfound to be a very useful feature. We take advantageof the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correctassociation. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria. CONCLUSIONS: In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about thefunctions of newly discovered candidate protein motifs.
format Text
id pubmed-517493
institution National Center for Biotechnology Information
language English
publishDate 2004
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-5174932004-09-17 Automatic annotation of protein motif function with Gene Ontology terms Lu, Xinghua Zhai, Chengxiang Gopalakrishnan, Vanathi Buchanan, Bruce G BMC Bioinformatics Methodology Article BACKGROUND: Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, amuch needed and importanttask is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO) project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. RESULTS: This paperpresents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifsis viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and aGO term association isfound to be a very useful feature. We take advantageof the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correctassociation. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria. CONCLUSIONS: In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about thefunctions of newly discovered candidate protein motifs. BioMed Central 2004-09-02 /pmc/articles/PMC517493/ /pubmed/15345032 http://dx.doi.org/10.1186/1471-2105-5-122 Text en Copyright © 2004 Lu et al; licensee BioMed Central Ltd.
spellingShingle Methodology Article
Lu, Xinghua
Zhai, Chengxiang
Gopalakrishnan, Vanathi
Buchanan, Bruce G
Automatic annotation of protein motif function with Gene Ontology terms
title Automatic annotation of protein motif function with Gene Ontology terms
title_full Automatic annotation of protein motif function with Gene Ontology terms
title_fullStr Automatic annotation of protein motif function with Gene Ontology terms
title_full_unstemmed Automatic annotation of protein motif function with Gene Ontology terms
title_short Automatic annotation of protein motif function with Gene Ontology terms
title_sort automatic annotation of protein motif function with gene ontology terms
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC517493/
https://www.ncbi.nlm.nih.gov/pubmed/15345032
http://dx.doi.org/10.1186/1471-2105-5-122
work_keys_str_mv AT luxinghua automaticannotationofproteinmotiffunctionwithgeneontologyterms
AT zhaichengxiang automaticannotationofproteinmotiffunctionwithgeneontologyterms
AT gopalakrishnanvanathi automaticannotationofproteinmotiffunctionwithgeneontologyterms
AT buchananbruceg automaticannotationofproteinmotiffunctionwithgeneontologyterms