Cargando…

Exploiting ontology graph for predicting sparsely annotated gene function

Motivation: Systematically predicting gene (or protein) function based on molecular interaction networks has become an important tool in refining and enhancing the existing annotation catalogs, such as the Gene Ontology (GO) database. However, functional labels with only a few (<10) annotated gen...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Sheng, Cho, Hyunghoon, Zhai, ChengXiang, Berger, Bonnie, Peng, Jian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4542782/
https://www.ncbi.nlm.nih.gov/pubmed/26072504
http://dx.doi.org/10.1093/bioinformatics/btv260
_version_ 1782386562046099456
author Wang, Sheng
Cho, Hyunghoon
Zhai, ChengXiang
Berger, Bonnie
Peng, Jian
author_facet Wang, Sheng
Cho, Hyunghoon
Zhai, ChengXiang
Berger, Bonnie
Peng, Jian
author_sort Wang, Sheng
collection PubMed
description Motivation: Systematically predicting gene (or protein) function based on molecular interaction networks has become an important tool in refining and enhancing the existing annotation catalogs, such as the Gene Ontology (GO) database. However, functional labels with only a few (<10) annotated genes, which constitute about half of the GO terms in yeast, mouse and human, pose a unique challenge in that any prediction algorithm that independently considers each label faces a paucity of information and thus is prone to capture non-generalizable patterns in the data, resulting in poor predictive performance. There exist a variety of algorithms for function prediction, but none properly address this ‘overfitting’ issue of sparsely annotated functions, or do so in a manner scalable to tens of thousands of functions in the human catalog. Results: We propose a novel function prediction algorithm, clusDCA, which transfers information between similar functional labels to alleviate the overfitting problem for sparsely annotated functions. Our method is scalable to datasets with a large number of annotations. In a cross-validation experiment in yeast, mouse and human, our method greatly outperformed previous state-of-the-art function prediction algorithms in predicting sparsely annotated functions, without sacrificing the performance on labels with sufficient information. Furthermore, we show that our method can accurately predict genes that will be assigned a functional label that has no known annotations, based only on the ontology graph structure and genes associated with other labels, which further suggests that our method effectively utilizes the similarity between gene functions. Availability and implementation: https://github.com/wangshenguiuc/clusDCA. Contact: jianpeng@illinois.edu Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-4542782
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-45427822015-08-25 Exploiting ontology graph for predicting sparsely annotated gene function Wang, Sheng Cho, Hyunghoon Zhai, ChengXiang Berger, Bonnie Peng, Jian Bioinformatics Ismb/Eccb 2015 Proceedings Papers Committee July 10 to July 14, 2015, Dublin, Ireland Motivation: Systematically predicting gene (or protein) function based on molecular interaction networks has become an important tool in refining and enhancing the existing annotation catalogs, such as the Gene Ontology (GO) database. However, functional labels with only a few (<10) annotated genes, which constitute about half of the GO terms in yeast, mouse and human, pose a unique challenge in that any prediction algorithm that independently considers each label faces a paucity of information and thus is prone to capture non-generalizable patterns in the data, resulting in poor predictive performance. There exist a variety of algorithms for function prediction, but none properly address this ‘overfitting’ issue of sparsely annotated functions, or do so in a manner scalable to tens of thousands of functions in the human catalog. Results: We propose a novel function prediction algorithm, clusDCA, which transfers information between similar functional labels to alleviate the overfitting problem for sparsely annotated functions. Our method is scalable to datasets with a large number of annotations. In a cross-validation experiment in yeast, mouse and human, our method greatly outperformed previous state-of-the-art function prediction algorithms in predicting sparsely annotated functions, without sacrificing the performance on labels with sufficient information. Furthermore, we show that our method can accurately predict genes that will be assigned a functional label that has no known annotations, based only on the ontology graph structure and genes associated with other labels, which further suggests that our method effectively utilizes the similarity between gene functions. Availability and implementation: https://github.com/wangshenguiuc/clusDCA. Contact: jianpeng@illinois.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2015-06-15 2015-06-10 /pmc/articles/PMC4542782/ /pubmed/26072504 http://dx.doi.org/10.1093/bioinformatics/btv260 Text en © The Author 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License(http://creativecommons.org/licenses/by-nc/3.0/),which permits non-commercial reuse, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb/Eccb 2015 Proceedings Papers Committee July 10 to July 14, 2015, Dublin, Ireland
Wang, Sheng
Cho, Hyunghoon
Zhai, ChengXiang
Berger, Bonnie
Peng, Jian
Exploiting ontology graph for predicting sparsely annotated gene function
title Exploiting ontology graph for predicting sparsely annotated gene function
title_full Exploiting ontology graph for predicting sparsely annotated gene function
title_fullStr Exploiting ontology graph for predicting sparsely annotated gene function
title_full_unstemmed Exploiting ontology graph for predicting sparsely annotated gene function
title_short Exploiting ontology graph for predicting sparsely annotated gene function
title_sort exploiting ontology graph for predicting sparsely annotated gene function
topic Ismb/Eccb 2015 Proceedings Papers Committee July 10 to July 14, 2015, Dublin, Ireland
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4542782/
https://www.ncbi.nlm.nih.gov/pubmed/26072504
http://dx.doi.org/10.1093/bioinformatics/btv260
work_keys_str_mv AT wangsheng exploitingontologygraphforpredictingsparselyannotatedgenefunction
AT chohyunghoon exploitingontologygraphforpredictingsparselyannotatedgenefunction
AT zhaichengxiang exploitingontologygraphforpredictingsparselyannotatedgenefunction
AT bergerbonnie exploitingontologygraphforpredictingsparselyannotatedgenefunction
AT pengjian exploitingontologygraphforpredictingsparselyannotatedgenefunction