Cargando…

Predicting protein function via downward random walks on a gene ontology

BACKGROUND: High-throughput bio-techniques accumulate ever-increasing amount of genomic and proteomic data. These data are far from being functionally characterized, despite the advances in gene (or gene’s product proteins) functional annotations. Due to experimental techniques and to the research b...

Descripción completa

Detalles Bibliográficos
Autores principales: Yu, Guoxian, Zhu, Hailong, Domeniconi, Carlotta, Liu, Jiming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4551531/
https://www.ncbi.nlm.nih.gov/pubmed/26310806
http://dx.doi.org/10.1186/s12859-015-0713-y
_version_ 1782387581340614656
author Yu, Guoxian
Zhu, Hailong
Domeniconi, Carlotta
Liu, Jiming
author_facet Yu, Guoxian
Zhu, Hailong
Domeniconi, Carlotta
Liu, Jiming
author_sort Yu, Guoxian
collection PubMed
description BACKGROUND: High-throughput bio-techniques accumulate ever-increasing amount of genomic and proteomic data. These data are far from being functionally characterized, despite the advances in gene (or gene’s product proteins) functional annotations. Due to experimental techniques and to the research bias in biology, the regularly updated functional annotation databases, i.e., the Gene Ontology (GO), are far from being complete. Given the importance of protein functions for biological studies and drug design, proteins should be more comprehensively and precisely annotated. RESULTS: We proposed downward Random Walks (dRW) to predict missing (or new) functions of partially annotated proteins. Particularly, we apply downward random walks with restart on the GO directed acyclic graph, along with the available functions of a protein, to estimate the probability of missing functions. To further boost the prediction accuracy, we extend dRW to dRW-kNN. dRW-kNN computes the semantic similarity between proteins based on the functional annotations of proteins; it then predicts functions based on the functions estimated by dRW, together with the functions associated with the k nearest proteins. Our proposed models can predict two kinds of missing functions: (i) the ones that are missing for a protein but associated with other proteins of interest; (ii) the ones that are not available for any protein of interest, but exist in the GO hierarchy. Experimental results on the proteins of Yeast and Human show that dRW and dRW-kNN can replenish functions more accurately than other related approaches, especially for sparse functions associated with no more than 10 proteins. CONCLUSION: The empirical study shows that the semantic similarity between GO terms and the ontology hierarchy play important roles in predicting protein function. The proposed dRW and dRW-kNN can serve as tools for replenishing functions of partially annotated proteins. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0713-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4551531
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-45515312015-08-29 Predicting protein function via downward random walks on a gene ontology Yu, Guoxian Zhu, Hailong Domeniconi, Carlotta Liu, Jiming BMC Bioinformatics Methodology Article BACKGROUND: High-throughput bio-techniques accumulate ever-increasing amount of genomic and proteomic data. These data are far from being functionally characterized, despite the advances in gene (or gene’s product proteins) functional annotations. Due to experimental techniques and to the research bias in biology, the regularly updated functional annotation databases, i.e., the Gene Ontology (GO), are far from being complete. Given the importance of protein functions for biological studies and drug design, proteins should be more comprehensively and precisely annotated. RESULTS: We proposed downward Random Walks (dRW) to predict missing (or new) functions of partially annotated proteins. Particularly, we apply downward random walks with restart on the GO directed acyclic graph, along with the available functions of a protein, to estimate the probability of missing functions. To further boost the prediction accuracy, we extend dRW to dRW-kNN. dRW-kNN computes the semantic similarity between proteins based on the functional annotations of proteins; it then predicts functions based on the functions estimated by dRW, together with the functions associated with the k nearest proteins. Our proposed models can predict two kinds of missing functions: (i) the ones that are missing for a protein but associated with other proteins of interest; (ii) the ones that are not available for any protein of interest, but exist in the GO hierarchy. Experimental results on the proteins of Yeast and Human show that dRW and dRW-kNN can replenish functions more accurately than other related approaches, especially for sparse functions associated with no more than 10 proteins. CONCLUSION: The empirical study shows that the semantic similarity between GO terms and the ontology hierarchy play important roles in predicting protein function. The proposed dRW and dRW-kNN can serve as tools for replenishing functions of partially annotated proteins. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0713-y) contains supplementary material, which is available to authorized users. BioMed Central 2015-08-27 /pmc/articles/PMC4551531/ /pubmed/26310806 http://dx.doi.org/10.1186/s12859-015-0713-y Text en © Yu et al. 2015 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Yu, Guoxian
Zhu, Hailong
Domeniconi, Carlotta
Liu, Jiming
Predicting protein function via downward random walks on a gene ontology
title Predicting protein function via downward random walks on a gene ontology
title_full Predicting protein function via downward random walks on a gene ontology
title_fullStr Predicting protein function via downward random walks on a gene ontology
title_full_unstemmed Predicting protein function via downward random walks on a gene ontology
title_short Predicting protein function via downward random walks on a gene ontology
title_sort predicting protein function via downward random walks on a gene ontology
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4551531/
https://www.ncbi.nlm.nih.gov/pubmed/26310806
http://dx.doi.org/10.1186/s12859-015-0713-y
work_keys_str_mv AT yuguoxian predictingproteinfunctionviadownwardrandomwalksonageneontology
AT zhuhailong predictingproteinfunctionviadownwardrandomwalksonageneontology
AT domeniconicarlotta predictingproteinfunctionviadownwardrandomwalksonageneontology
AT liujiming predictingproteinfunctionviadownwardrandomwalksonageneontology