Cargando…
Bayesian correlated clustering to integrate multiple datasets
Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct—but often complementary—information. We present a Bayesian method for the unsupervi...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3519452/ https://www.ncbi.nlm.nih.gov/pubmed/23047558 http://dx.doi.org/10.1093/bioinformatics/bts595 |
_version_ | 1782252664626610176 |
---|---|
author | Kirk, Paul Griffin, Jim E. Savage, Richard S. Ghahramani, Zoubin Wild, David L. |
author_facet | Kirk, Paul Griffin, Jim E. Savage, Richard S. Ghahramani, Zoubin Wild, David L. |
author_sort | Kirk, Paul |
collection | PubMed |
description | Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct—but often complementary—information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets. Results: Using a set of six artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real Saccharomyces cerevisiae datasets. In the two-dataset case, we show that MDI’s performance is comparable with the present state-of-the-art. We then move beyond the capabilities of current approaches and integrate gene expression, chromatin immunoprecipitation–chip and protein–protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques—as well as to non-integrative approaches—demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods. Availability: A Matlab implementation of MDI is available from http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/. Contact: D.L.Wild@warwick.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-3519452 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-35194522013-02-22 Bayesian correlated clustering to integrate multiple datasets Kirk, Paul Griffin, Jim E. Savage, Richard S. Ghahramani, Zoubin Wild, David L. Bioinformatics Original Papers Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct—but often complementary—information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets. Results: Using a set of six artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real Saccharomyces cerevisiae datasets. In the two-dataset case, we show that MDI’s performance is comparable with the present state-of-the-art. We then move beyond the capabilities of current approaches and integrate gene expression, chromatin immunoprecipitation–chip and protein–protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques—as well as to non-integrative approaches—demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods. Availability: A Matlab implementation of MDI is available from http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/. Contact: D.L.Wild@warwick.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2012-12 2012-10-09 /pmc/articles/PMC3519452/ /pubmed/23047558 http://dx.doi.org/10.1093/bioinformatics/bts595 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Kirk, Paul Griffin, Jim E. Savage, Richard S. Ghahramani, Zoubin Wild, David L. Bayesian correlated clustering to integrate multiple datasets |
title | Bayesian correlated clustering to integrate multiple datasets |
title_full | Bayesian correlated clustering to integrate multiple datasets |
title_fullStr | Bayesian correlated clustering to integrate multiple datasets |
title_full_unstemmed | Bayesian correlated clustering to integrate multiple datasets |
title_short | Bayesian correlated clustering to integrate multiple datasets |
title_sort | bayesian correlated clustering to integrate multiple datasets |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3519452/ https://www.ncbi.nlm.nih.gov/pubmed/23047558 http://dx.doi.org/10.1093/bioinformatics/bts595 |
work_keys_str_mv | AT kirkpaul bayesiancorrelatedclusteringtointegratemultipledatasets AT griffinjime bayesiancorrelatedclusteringtointegratemultipledatasets AT savagerichards bayesiancorrelatedclusteringtointegratemultipledatasets AT ghahramanizoubin bayesiancorrelatedclusteringtointegratemultipledatasets AT wilddavidl bayesiancorrelatedclusteringtointegratemultipledatasets |