Cargando…

Flexible model-based clustering of mixed binary and continuous data: application to genetic regulation and cancer

Clustering is used widely in ‘omics’ studies and is often tackled with standard methods, e.g. hierarchical clustering. However, the increasing need for integration of multiple data sets leads to a requirement for clustering methods applicable to mixed data types, where the straightforward applicatio...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zainul Abidin, Fatin N., Westhead, David R.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2017
Materias:	Methods Online
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5399749/ https://www.ncbi.nlm.nih.gov/pubmed/27994031 http://dx.doi.org/10.1093/nar/gkw1270

_version_	1783230700686671872
author	Zainul Abidin, Fatin N. Westhead, David R.
author_facet	Zainul Abidin, Fatin N. Westhead, David R.
author_sort	Zainul Abidin, Fatin N.
collection	PubMed
description	Clustering is used widely in ‘omics’ studies and is often tackled with standard methods, e.g. hierarchical clustering. However, the increasing need for integration of multiple data sets leads to a requirement for clustering methods applicable to mixed data types, where the straightforward application of standard methods is not necessarily the best approach. A particularly common problem involves clustering entities characterized by a mixture of binary data (e.g. presence/absence of mutations, binding, motifs and epigenetic marks) and continuous data (e.g. gene expression, protein abundance, metabolite levels). Here, we present a generic method based on a probabilistic model for clustering this type of data, and illustrate its application to genetic regulation and the clustering of cancer samples. We show that the resulting clusters lead to useful hypotheses: in the case of genetic regulation these concern regulation of groups of genes by specific sets of transcription factors and in the case of cancer samples combinations of gene mutations are related to patterns of gene expression. The clusters have potential mechanistic significance and in the latter case are significantly linked to survival. The method is available as a stand-alone software package (GNU General Public Licence) from http://github.com/BioToolsLeeds/FlexiCoClusteringPackage.git.
format	Online Article Text
id	pubmed-5399749
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-53997492017-04-28 Flexible model-based clustering of mixed binary and continuous data: application to genetic regulation and cancer Zainul Abidin, Fatin N. Westhead, David R. Nucleic Acids Res Methods Online Clustering is used widely in ‘omics’ studies and is often tackled with standard methods, e.g. hierarchical clustering. However, the increasing need for integration of multiple data sets leads to a requirement for clustering methods applicable to mixed data types, where the straightforward application of standard methods is not necessarily the best approach. A particularly common problem involves clustering entities characterized by a mixture of binary data (e.g. presence/absence of mutations, binding, motifs and epigenetic marks) and continuous data (e.g. gene expression, protein abundance, metabolite levels). Here, we present a generic method based on a probabilistic model for clustering this type of data, and illustrate its application to genetic regulation and the clustering of cancer samples. We show that the resulting clusters lead to useful hypotheses: in the case of genetic regulation these concern regulation of groups of genes by specific sets of transcription factors and in the case of cancer samples combinations of gene mutations are related to patterns of gene expression. The clusters have potential mechanistic significance and in the latter case are significantly linked to survival. The method is available as a stand-alone software package (GNU General Public Licence) from http://github.com/BioToolsLeeds/FlexiCoClusteringPackage.git. Oxford University Press 2017-04-20 2016-12-19 /pmc/articles/PMC5399749/ /pubmed/27994031 http://dx.doi.org/10.1093/nar/gkw1270 Text en © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methods Online Zainul Abidin, Fatin N. Westhead, David R. Flexible model-based clustering of mixed binary and continuous data: application to genetic regulation and cancer
title	Flexible model-based clustering of mixed binary and continuous data: application to genetic regulation and cancer
title_full	Flexible model-based clustering of mixed binary and continuous data: application to genetic regulation and cancer
title_fullStr	Flexible model-based clustering of mixed binary and continuous data: application to genetic regulation and cancer
title_full_unstemmed	Flexible model-based clustering of mixed binary and continuous data: application to genetic regulation and cancer
title_short	Flexible model-based clustering of mixed binary and continuous data: application to genetic regulation and cancer
title_sort	flexible model-based clustering of mixed binary and continuous data: application to genetic regulation and cancer
topic	Methods Online
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5399749/ https://www.ncbi.nlm.nih.gov/pubmed/27994031 http://dx.doi.org/10.1093/nar/gkw1270
work_keys_str_mv	AT zainulabidinfatinn flexiblemodelbasedclusteringofmixedbinaryandcontinuousdataapplicationtogeneticregulationandcancer AT westheaddavidr flexiblemodelbasedclusteringofmixedbinaryandcontinuousdataapplicationtogeneticregulationandcancer

Flexible model-based clustering of mixed binary and continuous data: application to genetic regulation and cancer

Ejemplares similares