Cargando…

Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets

Many genome-wide datasets are routinely generated to study different aspects of biological systems, but integrating them to obtain a coherent view of the underlying biology remains a challenge. We propose simultaneous clustering of multiple networks as a framework to integrate large-scale datasets o...

Descripción completa

Detalles Bibliográficos
Autores principales: Narayanan, Manikandan, Vetta, Adrian, Schadt, Eric E., Zhu, Jun
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2855327/
https://www.ncbi.nlm.nih.gov/pubmed/20419151
http://dx.doi.org/10.1371/journal.pcbi.1000742
_version_ 1782180170551001088
author Narayanan, Manikandan
Vetta, Adrian
Schadt, Eric E.
Zhu, Jun
author_facet Narayanan, Manikandan
Vetta, Adrian
Schadt, Eric E.
Zhu, Jun
author_sort Narayanan, Manikandan
collection PubMed
description Many genome-wide datasets are routinely generated to study different aspects of biological systems, but integrating them to obtain a coherent view of the underlying biology remains a challenge. We propose simultaneous clustering of multiple networks as a framework to integrate large-scale datasets on the interactions among and activities of cellular components. Specifically, we develop an algorithm JointCluster that finds sets of genes that cluster well in multiple networks of interest, such as coexpression networks summarizing correlations among the expression profiles of genes and physical networks describing protein-protein and protein-DNA interactions among genes or gene-products. Our algorithm provides an efficient solution to a well-defined problem of jointly clustering networks, using techniques that permit certain theoretical guarantees on the quality of the detected clustering relative to the optimal clustering. These guarantees coupled with an effective scaling heuristic and the flexibility to handle multiple heterogeneous networks make our method JointCluster an advance over earlier approaches. Simulation results showed JointCluster to be more robust than alternate methods in recovering clusters implanted in networks with high false positive rates. In systematic evaluation of JointCluster and some earlier approaches for combined analysis of the yeast physical network and two gene expression datasets under glucose and ethanol growth conditions, JointCluster discovers clusters that are more consistently enriched for various reference classes capturing different aspects of yeast biology or yield better coverage of the analysed genes. These robust clusters, which are supported across multiple genomic datasets and diverse reference classes, agree with known biology of yeast under these growth conditions, elucidate the genetic control of coordinated transcription, and enable functional predictions for a number of uncharacterized genes.
format Text
id pubmed-2855327
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-28553272010-04-23 Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets Narayanan, Manikandan Vetta, Adrian Schadt, Eric E. Zhu, Jun PLoS Comput Biol Research Article Many genome-wide datasets are routinely generated to study different aspects of biological systems, but integrating them to obtain a coherent view of the underlying biology remains a challenge. We propose simultaneous clustering of multiple networks as a framework to integrate large-scale datasets on the interactions among and activities of cellular components. Specifically, we develop an algorithm JointCluster that finds sets of genes that cluster well in multiple networks of interest, such as coexpression networks summarizing correlations among the expression profiles of genes and physical networks describing protein-protein and protein-DNA interactions among genes or gene-products. Our algorithm provides an efficient solution to a well-defined problem of jointly clustering networks, using techniques that permit certain theoretical guarantees on the quality of the detected clustering relative to the optimal clustering. These guarantees coupled with an effective scaling heuristic and the flexibility to handle multiple heterogeneous networks make our method JointCluster an advance over earlier approaches. Simulation results showed JointCluster to be more robust than alternate methods in recovering clusters implanted in networks with high false positive rates. In systematic evaluation of JointCluster and some earlier approaches for combined analysis of the yeast physical network and two gene expression datasets under glucose and ethanol growth conditions, JointCluster discovers clusters that are more consistently enriched for various reference classes capturing different aspects of yeast biology or yield better coverage of the analysed genes. These robust clusters, which are supported across multiple genomic datasets and diverse reference classes, agree with known biology of yeast under these growth conditions, elucidate the genetic control of coordinated transcription, and enable functional predictions for a number of uncharacterized genes. Public Library of Science 2010-04-15 /pmc/articles/PMC2855327/ /pubmed/20419151 http://dx.doi.org/10.1371/journal.pcbi.1000742 Text en Narayanan et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Narayanan, Manikandan
Vetta, Adrian
Schadt, Eric E.
Zhu, Jun
Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets
title Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets
title_full Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets
title_fullStr Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets
title_full_unstemmed Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets
title_short Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets
title_sort simultaneous clustering of multiple gene expression and physical interaction datasets
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2855327/
https://www.ncbi.nlm.nih.gov/pubmed/20419151
http://dx.doi.org/10.1371/journal.pcbi.1000742
work_keys_str_mv AT narayananmanikandan simultaneousclusteringofmultiplegeneexpressionandphysicalinteractiondatasets
AT vettaadrian simultaneousclusteringofmultiplegeneexpressionandphysicalinteractiondatasets
AT schadterice simultaneousclusteringofmultiplegeneexpressionandphysicalinteractiondatasets
AT zhujun simultaneousclusteringofmultiplegeneexpressionandphysicalinteractiondatasets