Cargando…

Clusternomics: Integrative context-dependent clustering for heterogeneous datasets

Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set o...

Descripción completa

Detalles Bibliográficos
Autores principales: Gabasova, Evelina, Reid, John, Wernisch, Lorenz
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5658176/
https://www.ncbi.nlm.nih.gov/pubmed/29036190
http://dx.doi.org/10.1371/journal.pcbi.1005781
_version_ 1783273948651192320
author Gabasova, Evelina
Reid, John
Wernisch, Lorenz
author_facet Gabasova, Evelina
Reid, John
Wernisch, Lorenz
author_sort Gabasova, Evelina
collection PubMed
description Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm.
format Online
Article
Text
id pubmed-5658176
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-56581762017-11-09 Clusternomics: Integrative context-dependent clustering for heterogeneous datasets Gabasova, Evelina Reid, John Wernisch, Lorenz PLoS Comput Biol Research Article Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm. Public Library of Science 2017-10-16 /pmc/articles/PMC5658176/ /pubmed/29036190 http://dx.doi.org/10.1371/journal.pcbi.1005781 Text en © 2017 Gabasova et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Gabasova, Evelina
Reid, John
Wernisch, Lorenz
Clusternomics: Integrative context-dependent clustering for heterogeneous datasets
title Clusternomics: Integrative context-dependent clustering for heterogeneous datasets
title_full Clusternomics: Integrative context-dependent clustering for heterogeneous datasets
title_fullStr Clusternomics: Integrative context-dependent clustering for heterogeneous datasets
title_full_unstemmed Clusternomics: Integrative context-dependent clustering for heterogeneous datasets
title_short Clusternomics: Integrative context-dependent clustering for heterogeneous datasets
title_sort clusternomics: integrative context-dependent clustering for heterogeneous datasets
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5658176/
https://www.ncbi.nlm.nih.gov/pubmed/29036190
http://dx.doi.org/10.1371/journal.pcbi.1005781
work_keys_str_mv AT gabasovaevelina clusternomicsintegrativecontextdependentclusteringforheterogeneousdatasets
AT reidjohn clusternomicsintegrativecontextdependentclusteringforheterogeneousdatasets
AT wernischlorenz clusternomicsintegrativecontextdependentclusteringforheterogeneousdatasets