Cargando…

Bayesian correlated clustering to integrate multiple datasets

Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct—but often complementary—information. We present a Bayesian method for the unsupervi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kirk, Paul, Griffin, Jim E., Savage, Richard S., Ghahramani, Zoubin, Wild, David L.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2012
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3519452/ https://www.ncbi.nlm.nih.gov/pubmed/23047558 http://dx.doi.org/10.1093/bioinformatics/bts595

_version_	1782252664626610176
author	Kirk, Paul Griffin, Jim E. Savage, Richard S. Ghahramani, Zoubin Wild, David L.
author_facet	Kirk, Paul Griffin, Jim E. Savage, Richard S. Ghahramani, Zoubin Wild, David L.
author_sort	Kirk, Paul
collection	PubMed
description	Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct—but often complementary—information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets. Results: Using a set of six artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real Saccharomyces cerevisiae datasets. In the two-dataset case, we show that MDI’s performance is comparable with the present state-of-the-art. We then move beyond the capabilities of current approaches and integrate gene expression, chromatin immunoprecipitation–chip and protein–protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques—as well as to non-integrative approaches—demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods. Availability: A Matlab implementation of MDI is available from http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/. Contact: D.L.Wild@warwick.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-3519452
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-35194522013-02-22 Bayesian correlated clustering to integrate multiple datasets Kirk, Paul Griffin, Jim E. Savage, Richard S. Ghahramani, Zoubin Wild, David L. Bioinformatics Original Papers Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct—but often complementary—information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets. Results: Using a set of six artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real Saccharomyces cerevisiae datasets. In the two-dataset case, we show that MDI’s performance is comparable with the present state-of-the-art. We then move beyond the capabilities of current approaches and integrate gene expression, chromatin immunoprecipitation–chip and protein–protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques—as well as to non-integrative approaches—demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods. Availability: A Matlab implementation of MDI is available from http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/. Contact: D.L.Wild@warwick.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2012-12 2012-10-09 /pmc/articles/PMC3519452/ /pubmed/23047558 http://dx.doi.org/10.1093/bioinformatics/bts595 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Papers Kirk, Paul Griffin, Jim E. Savage, Richard S. Ghahramani, Zoubin Wild, David L. Bayesian correlated clustering to integrate multiple datasets
title	Bayesian correlated clustering to integrate multiple datasets
title_full	Bayesian correlated clustering to integrate multiple datasets
title_fullStr	Bayesian correlated clustering to integrate multiple datasets
title_full_unstemmed	Bayesian correlated clustering to integrate multiple datasets
title_short	Bayesian correlated clustering to integrate multiple datasets
title_sort	bayesian correlated clustering to integrate multiple datasets
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3519452/ https://www.ncbi.nlm.nih.gov/pubmed/23047558 http://dx.doi.org/10.1093/bioinformatics/bts595
work_keys_str_mv	AT kirkpaul bayesiancorrelatedclusteringtointegratemultipledatasets AT griffinjime bayesiancorrelatedclusteringtointegratemultipledatasets AT savagerichards bayesiancorrelatedclusteringtointegratemultipledatasets AT ghahramanizoubin bayesiancorrelatedclusteringtointegratemultipledatasets AT wilddavidl bayesiancorrelatedclusteringtointegratemultipledatasets

Bayesian correlated clustering to integrate multiple datasets

Ejemplares similares