Cargando…
Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets
Many genome-wide datasets are routinely generated to study different aspects of biological systems, but integrating them to obtain a coherent view of the underlying biology remains a challenge. We propose simultaneous clustering of multiple networks as a framework to integrate large-scale datasets o...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2855327/ https://www.ncbi.nlm.nih.gov/pubmed/20419151 http://dx.doi.org/10.1371/journal.pcbi.1000742 |
_version_ | 1782180170551001088 |
---|---|
author | Narayanan, Manikandan Vetta, Adrian Schadt, Eric E. Zhu, Jun |
author_facet | Narayanan, Manikandan Vetta, Adrian Schadt, Eric E. Zhu, Jun |
author_sort | Narayanan, Manikandan |
collection | PubMed |
description | Many genome-wide datasets are routinely generated to study different aspects of biological systems, but integrating them to obtain a coherent view of the underlying biology remains a challenge. We propose simultaneous clustering of multiple networks as a framework to integrate large-scale datasets on the interactions among and activities of cellular components. Specifically, we develop an algorithm JointCluster that finds sets of genes that cluster well in multiple networks of interest, such as coexpression networks summarizing correlations among the expression profiles of genes and physical networks describing protein-protein and protein-DNA interactions among genes or gene-products. Our algorithm provides an efficient solution to a well-defined problem of jointly clustering networks, using techniques that permit certain theoretical guarantees on the quality of the detected clustering relative to the optimal clustering. These guarantees coupled with an effective scaling heuristic and the flexibility to handle multiple heterogeneous networks make our method JointCluster an advance over earlier approaches. Simulation results showed JointCluster to be more robust than alternate methods in recovering clusters implanted in networks with high false positive rates. In systematic evaluation of JointCluster and some earlier approaches for combined analysis of the yeast physical network and two gene expression datasets under glucose and ethanol growth conditions, JointCluster discovers clusters that are more consistently enriched for various reference classes capturing different aspects of yeast biology or yield better coverage of the analysed genes. These robust clusters, which are supported across multiple genomic datasets and diverse reference classes, agree with known biology of yeast under these growth conditions, elucidate the genetic control of coordinated transcription, and enable functional predictions for a number of uncharacterized genes. |
format | Text |
id | pubmed-2855327 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-28553272010-04-23 Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets Narayanan, Manikandan Vetta, Adrian Schadt, Eric E. Zhu, Jun PLoS Comput Biol Research Article Many genome-wide datasets are routinely generated to study different aspects of biological systems, but integrating them to obtain a coherent view of the underlying biology remains a challenge. We propose simultaneous clustering of multiple networks as a framework to integrate large-scale datasets on the interactions among and activities of cellular components. Specifically, we develop an algorithm JointCluster that finds sets of genes that cluster well in multiple networks of interest, such as coexpression networks summarizing correlations among the expression profiles of genes and physical networks describing protein-protein and protein-DNA interactions among genes or gene-products. Our algorithm provides an efficient solution to a well-defined problem of jointly clustering networks, using techniques that permit certain theoretical guarantees on the quality of the detected clustering relative to the optimal clustering. These guarantees coupled with an effective scaling heuristic and the flexibility to handle multiple heterogeneous networks make our method JointCluster an advance over earlier approaches. Simulation results showed JointCluster to be more robust than alternate methods in recovering clusters implanted in networks with high false positive rates. In systematic evaluation of JointCluster and some earlier approaches for combined analysis of the yeast physical network and two gene expression datasets under glucose and ethanol growth conditions, JointCluster discovers clusters that are more consistently enriched for various reference classes capturing different aspects of yeast biology or yield better coverage of the analysed genes. These robust clusters, which are supported across multiple genomic datasets and diverse reference classes, agree with known biology of yeast under these growth conditions, elucidate the genetic control of coordinated transcription, and enable functional predictions for a number of uncharacterized genes. Public Library of Science 2010-04-15 /pmc/articles/PMC2855327/ /pubmed/20419151 http://dx.doi.org/10.1371/journal.pcbi.1000742 Text en Narayanan et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Narayanan, Manikandan Vetta, Adrian Schadt, Eric E. Zhu, Jun Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets |
title | Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets |
title_full | Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets |
title_fullStr | Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets |
title_full_unstemmed | Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets |
title_short | Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets |
title_sort | simultaneous clustering of multiple gene expression and physical interaction datasets |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2855327/ https://www.ncbi.nlm.nih.gov/pubmed/20419151 http://dx.doi.org/10.1371/journal.pcbi.1000742 |
work_keys_str_mv | AT narayananmanikandan simultaneousclusteringofmultiplegeneexpressionandphysicalinteractiondatasets AT vettaadrian simultaneousclusteringofmultiplegeneexpressionandphysicalinteractiondatasets AT schadterice simultaneousclusteringofmultiplegeneexpressionandphysicalinteractiondatasets AT zhujun simultaneousclusteringofmultiplegeneexpressionandphysicalinteractiondatasets |