Cargando…

Constructing gene co-expression networks and predicting functions of unknown genes by random matrix theory

BACKGROUND: Large-scale sequencing of entire genomes has ushered in a new age in biology. One of the next grand challenges is to dissect the cellular networks consisting of many individual functional modules. Defining co-expression networks without ambiguity based on genome-wide microarray data is d...

Descripción completa

Detalles Bibliográficos
Autores principales: Luo, Feng, Yang, Yunfeng, Zhong, Jianxin, Gao, Haichun, Khan, Latifur, Thompson, Dorothea K, Zhou, Jizhong
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2212665/
https://www.ncbi.nlm.nih.gov/pubmed/17697349
http://dx.doi.org/10.1186/1471-2105-8-299
_version_ 1782148740540268544
author Luo, Feng
Yang, Yunfeng
Zhong, Jianxin
Gao, Haichun
Khan, Latifur
Thompson, Dorothea K
Zhou, Jizhong
author_facet Luo, Feng
Yang, Yunfeng
Zhong, Jianxin
Gao, Haichun
Khan, Latifur
Thompson, Dorothea K
Zhou, Jizhong
author_sort Luo, Feng
collection PubMed
description BACKGROUND: Large-scale sequencing of entire genomes has ushered in a new age in biology. One of the next grand challenges is to dissect the cellular networks consisting of many individual functional modules. Defining co-expression networks without ambiguity based on genome-wide microarray data is difficult and current methods are not robust and consistent with different data sets. This is particularly problematic for little understood organisms since not much existing biological knowledge can be exploited for determining the threshold to differentiate true correlation from random noise. Random matrix theory (RMT), which has been widely and successfully used in physics, is a powerful approach to distinguish system-specific, non-random properties embedded in complex systems from random noise. Here, we have hypothesized that the universal predictions of RMT are also applicable to biological systems and the correlation threshold can be determined by characterizing the correlation matrix of microarray profiles using random matrix theory. RESULTS: Application of random matrix theory to microarray data of S. oneidensis, E. coli, yeast, A. thaliana, Drosophila, mouse and human indicates that there is a sharp transition of nearest neighbour spacing distribution (NNSD) of correlation matrix after gradually removing certain elements insider the matrix. Testing on an in silico modular model has demonstrated that this transition can be used to determine the correlation threshold for revealing modular co-expression networks. The co-expression network derived from yeast cell cycling microarray data is supported by gene annotation. The topological properties of the resulting co-expression network agree well with the general properties of biological networks. Computational evaluations have showed that RMT approach is sensitive and robust. Furthermore, evaluation on sampled expression data of an in silico modular gene system has showed that under-sampled expressions do not affect the recovery of gene co-expression network. Moreover, the cellular roles of 215 functionally unknown genes from yeast, E. coli and S. oneidensis are predicted by the gene co-expression networks using guilt-by-association principle, many of which are supported by existing information or our experimental verification, further demonstrating the reliability of this approach for gene function prediction. CONCLUSION: Our rigorous analysis of gene expression microarray profiles using RMT has showed that the transition of NNSD of correlation matrix of microarray profile provides a profound theoretical criterion to determine the correlation threshold for identifying gene co-expression networks.
format Text
id pubmed-2212665
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-22126652008-01-24 Constructing gene co-expression networks and predicting functions of unknown genes by random matrix theory Luo, Feng Yang, Yunfeng Zhong, Jianxin Gao, Haichun Khan, Latifur Thompson, Dorothea K Zhou, Jizhong BMC Bioinformatics Research Article BACKGROUND: Large-scale sequencing of entire genomes has ushered in a new age in biology. One of the next grand challenges is to dissect the cellular networks consisting of many individual functional modules. Defining co-expression networks without ambiguity based on genome-wide microarray data is difficult and current methods are not robust and consistent with different data sets. This is particularly problematic for little understood organisms since not much existing biological knowledge can be exploited for determining the threshold to differentiate true correlation from random noise. Random matrix theory (RMT), which has been widely and successfully used in physics, is a powerful approach to distinguish system-specific, non-random properties embedded in complex systems from random noise. Here, we have hypothesized that the universal predictions of RMT are also applicable to biological systems and the correlation threshold can be determined by characterizing the correlation matrix of microarray profiles using random matrix theory. RESULTS: Application of random matrix theory to microarray data of S. oneidensis, E. coli, yeast, A. thaliana, Drosophila, mouse and human indicates that there is a sharp transition of nearest neighbour spacing distribution (NNSD) of correlation matrix after gradually removing certain elements insider the matrix. Testing on an in silico modular model has demonstrated that this transition can be used to determine the correlation threshold for revealing modular co-expression networks. The co-expression network derived from yeast cell cycling microarray data is supported by gene annotation. The topological properties of the resulting co-expression network agree well with the general properties of biological networks. Computational evaluations have showed that RMT approach is sensitive and robust. Furthermore, evaluation on sampled expression data of an in silico modular gene system has showed that under-sampled expressions do not affect the recovery of gene co-expression network. Moreover, the cellular roles of 215 functionally unknown genes from yeast, E. coli and S. oneidensis are predicted by the gene co-expression networks using guilt-by-association principle, many of which are supported by existing information or our experimental verification, further demonstrating the reliability of this approach for gene function prediction. CONCLUSION: Our rigorous analysis of gene expression microarray profiles using RMT has showed that the transition of NNSD of correlation matrix of microarray profile provides a profound theoretical criterion to determine the correlation threshold for identifying gene co-expression networks. BioMed Central 2007-08-14 /pmc/articles/PMC2212665/ /pubmed/17697349 http://dx.doi.org/10.1186/1471-2105-8-299 Text en Copyright © 2007 Luo et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Luo, Feng
Yang, Yunfeng
Zhong, Jianxin
Gao, Haichun
Khan, Latifur
Thompson, Dorothea K
Zhou, Jizhong
Constructing gene co-expression networks and predicting functions of unknown genes by random matrix theory
title Constructing gene co-expression networks and predicting functions of unknown genes by random matrix theory
title_full Constructing gene co-expression networks and predicting functions of unknown genes by random matrix theory
title_fullStr Constructing gene co-expression networks and predicting functions of unknown genes by random matrix theory
title_full_unstemmed Constructing gene co-expression networks and predicting functions of unknown genes by random matrix theory
title_short Constructing gene co-expression networks and predicting functions of unknown genes by random matrix theory
title_sort constructing gene co-expression networks and predicting functions of unknown genes by random matrix theory
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2212665/
https://www.ncbi.nlm.nih.gov/pubmed/17697349
http://dx.doi.org/10.1186/1471-2105-8-299
work_keys_str_mv AT luofeng constructinggenecoexpressionnetworksandpredictingfunctionsofunknowngenesbyrandommatrixtheory
AT yangyunfeng constructinggenecoexpressionnetworksandpredictingfunctionsofunknowngenesbyrandommatrixtheory
AT zhongjianxin constructinggenecoexpressionnetworksandpredictingfunctionsofunknowngenesbyrandommatrixtheory
AT gaohaichun constructinggenecoexpressionnetworksandpredictingfunctionsofunknowngenesbyrandommatrixtheory
AT khanlatifur constructinggenecoexpressionnetworksandpredictingfunctionsofunknowngenesbyrandommatrixtheory
AT thompsondorotheak constructinggenecoexpressionnetworksandpredictingfunctionsofunknowngenesbyrandommatrixtheory
AT zhoujizhong constructinggenecoexpressionnetworksandpredictingfunctionsofunknowngenesbyrandommatrixtheory