Cargando…

A combined approach for genome wide protein function annotation/prediction

BACKGROUND: Today large scale genome sequencing technologies are uncovering an increasing amount of new genes and proteins, which remain uncharacterized. Experimental procedures for protein function prediction are low throughput by nature and thus can't be used to keep up with the rate at which...

Descripción completa

Detalles Bibliográficos
Autores principales: Benso, Alfredo, Di Carlo, Stefano, ur Rehman, Hafeez, Politano, Gianfranco, Savino, Alessandro, Suravajhala, Prashanth
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3909112/
https://www.ncbi.nlm.nih.gov/pubmed/24564915
http://dx.doi.org/10.1186/1477-5956-11-S1-S1
_version_ 1782301792066863104
author Benso, Alfredo
Di Carlo, Stefano
ur Rehman, Hafeez
Politano, Gianfranco
Savino, Alessandro
Suravajhala, Prashanth
author_facet Benso, Alfredo
Di Carlo, Stefano
ur Rehman, Hafeez
Politano, Gianfranco
Savino, Alessandro
Suravajhala, Prashanth
author_sort Benso, Alfredo
collection PubMed
description BACKGROUND: Today large scale genome sequencing technologies are uncovering an increasing amount of new genes and proteins, which remain uncharacterized. Experimental procedures for protein function prediction are low throughput by nature and thus can't be used to keep up with the rate at which new proteins are discovered. On the other hand, proteins are the prominent stakeholders in almost all biological processes, and therefore the need to precisely know their functions for a better understanding of the underlying biological mechanism is inevitable. The challenge of annotating uncharacterized proteins in functional genomics and biology in general motivates the use of computational techniques well orchestrated to accurately predict their functions. METHODS: We propose a computational flow for the functional annotation of a protein able to assign the most probable functions to a protein by aggregating heterogeneous information. Considered information include: protein motifs, protein sequence similarity, and protein homology data gathered from interacting proteins, combined with data from highly similar non-interacting proteins (hereinafter called Similactors). Moreover, to increase the predictive power of our model we also compute and integrate term specific relationships among functional terms based on Gene Ontology (GO). RESULTS: We tested our method on Saccharomyces Cerevisiae and Homo sapiens species proteins. The aggregation of different structural and functional evidence with GO relationships outperforms, in terms of precision and accuracy of prediction than the other methods reported in literature. The predicted precision and accuracy is 100% for more than half of the input set for both species; overall, we obtained 85.38% precision and 81.95% accuracy for Homo sapiens and 79.73% precision and 80.06% accuracy for Saccharomyces Cerevisiae species proteins.
format Online
Article
Text
id pubmed-3909112
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-39091122014-02-13 A combined approach for genome wide protein function annotation/prediction Benso, Alfredo Di Carlo, Stefano ur Rehman, Hafeez Politano, Gianfranco Savino, Alessandro Suravajhala, Prashanth Proteome Sci Research BACKGROUND: Today large scale genome sequencing technologies are uncovering an increasing amount of new genes and proteins, which remain uncharacterized. Experimental procedures for protein function prediction are low throughput by nature and thus can't be used to keep up with the rate at which new proteins are discovered. On the other hand, proteins are the prominent stakeholders in almost all biological processes, and therefore the need to precisely know their functions for a better understanding of the underlying biological mechanism is inevitable. The challenge of annotating uncharacterized proteins in functional genomics and biology in general motivates the use of computational techniques well orchestrated to accurately predict their functions. METHODS: We propose a computational flow for the functional annotation of a protein able to assign the most probable functions to a protein by aggregating heterogeneous information. Considered information include: protein motifs, protein sequence similarity, and protein homology data gathered from interacting proteins, combined with data from highly similar non-interacting proteins (hereinafter called Similactors). Moreover, to increase the predictive power of our model we also compute and integrate term specific relationships among functional terms based on Gene Ontology (GO). RESULTS: We tested our method on Saccharomyces Cerevisiae and Homo sapiens species proteins. The aggregation of different structural and functional evidence with GO relationships outperforms, in terms of precision and accuracy of prediction than the other methods reported in literature. The predicted precision and accuracy is 100% for more than half of the input set for both species; overall, we obtained 85.38% precision and 81.95% accuracy for Homo sapiens and 79.73% precision and 80.06% accuracy for Saccharomyces Cerevisiae species proteins. BioMed Central 2013-11-07 /pmc/articles/PMC3909112/ /pubmed/24564915 http://dx.doi.org/10.1186/1477-5956-11-S1-S1 Text en Copyright © 2013 Benso et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Benso, Alfredo
Di Carlo, Stefano
ur Rehman, Hafeez
Politano, Gianfranco
Savino, Alessandro
Suravajhala, Prashanth
A combined approach for genome wide protein function annotation/prediction
title A combined approach for genome wide protein function annotation/prediction
title_full A combined approach for genome wide protein function annotation/prediction
title_fullStr A combined approach for genome wide protein function annotation/prediction
title_full_unstemmed A combined approach for genome wide protein function annotation/prediction
title_short A combined approach for genome wide protein function annotation/prediction
title_sort combined approach for genome wide protein function annotation/prediction
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3909112/
https://www.ncbi.nlm.nih.gov/pubmed/24564915
http://dx.doi.org/10.1186/1477-5956-11-S1-S1
work_keys_str_mv AT bensoalfredo acombinedapproachforgenomewideproteinfunctionannotationprediction
AT dicarlostefano acombinedapproachforgenomewideproteinfunctionannotationprediction
AT urrehmanhafeez acombinedapproachforgenomewideproteinfunctionannotationprediction
AT politanogianfranco acombinedapproachforgenomewideproteinfunctionannotationprediction
AT savinoalessandro acombinedapproachforgenomewideproteinfunctionannotationprediction
AT suravajhalaprashanth acombinedapproachforgenomewideproteinfunctionannotationprediction
AT bensoalfredo combinedapproachforgenomewideproteinfunctionannotationprediction
AT dicarlostefano combinedapproachforgenomewideproteinfunctionannotationprediction
AT urrehmanhafeez combinedapproachforgenomewideproteinfunctionannotationprediction
AT politanogianfranco combinedapproachforgenomewideproteinfunctionannotationprediction
AT savinoalessandro combinedapproachforgenomewideproteinfunctionannotationprediction
AT suravajhalaprashanth combinedapproachforgenomewideproteinfunctionannotationprediction