Cargando…

Bayesian Hierarchical Clustering for Studying Cancer Gene Expression Data with Unknown Statistics

Clustering analysis is an important tool in studying gene expression data. The Bayesian hierarchical clustering (BHC) algorithm can automatically infer the number of clusters and uses Bayesian model selection to improve clustering quality. In this paper, we present an extension of the BHC algorithm....

Descripción completa

Detalles Bibliográficos
Autores principales: Sirinukunwattana, Korsuk, Savage, Richard S., Bari, Muhammad F., Snead, David R. J., Rajpoot, Nasir M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3806770/
https://www.ncbi.nlm.nih.gov/pubmed/24194826
http://dx.doi.org/10.1371/journal.pone.0075748
_version_ 1782288428638928896
author Sirinukunwattana, Korsuk
Savage, Richard S.
Bari, Muhammad F.
Snead, David R. J.
Rajpoot, Nasir M.
author_facet Sirinukunwattana, Korsuk
Savage, Richard S.
Bari, Muhammad F.
Snead, David R. J.
Rajpoot, Nasir M.
author_sort Sirinukunwattana, Korsuk
collection PubMed
description Clustering analysis is an important tool in studying gene expression data. The Bayesian hierarchical clustering (BHC) algorithm can automatically infer the number of clusters and uses Bayesian model selection to improve clustering quality. In this paper, we present an extension of the BHC algorithm. Our Gaussian BHC (GBHC) algorithm represents data as a mixture of Gaussian distributions. It uses normal-gamma distribution as a conjugate prior on the mean and precision of each of the Gaussian components. We tested GBHC over 11 cancer and 3 synthetic datasets. The results on cancer datasets show that in sample clustering, GBHC on average produces a clustering partition that is more concordant with the ground truth than those obtained from other commonly used algorithms. Furthermore, GBHC frequently infers the number of clusters that is often close to the ground truth. In gene clustering, GBHC also produces a clustering partition that is more biologically plausible than several other state-of-the-art methods. This suggests GBHC as an alternative tool for studying gene expression data. The implementation of GBHC is available at https://sites.google.com/site/gaussianbhc/
format Online
Article
Text
id pubmed-3806770
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-38067702013-11-05 Bayesian Hierarchical Clustering for Studying Cancer Gene Expression Data with Unknown Statistics Sirinukunwattana, Korsuk Savage, Richard S. Bari, Muhammad F. Snead, David R. J. Rajpoot, Nasir M. PLoS One Research Article Clustering analysis is an important tool in studying gene expression data. The Bayesian hierarchical clustering (BHC) algorithm can automatically infer the number of clusters and uses Bayesian model selection to improve clustering quality. In this paper, we present an extension of the BHC algorithm. Our Gaussian BHC (GBHC) algorithm represents data as a mixture of Gaussian distributions. It uses normal-gamma distribution as a conjugate prior on the mean and precision of each of the Gaussian components. We tested GBHC over 11 cancer and 3 synthetic datasets. The results on cancer datasets show that in sample clustering, GBHC on average produces a clustering partition that is more concordant with the ground truth than those obtained from other commonly used algorithms. Furthermore, GBHC frequently infers the number of clusters that is often close to the ground truth. In gene clustering, GBHC also produces a clustering partition that is more biologically plausible than several other state-of-the-art methods. This suggests GBHC as an alternative tool for studying gene expression data. The implementation of GBHC is available at https://sites.google.com/site/gaussianbhc/ Public Library of Science 2013-10-23 /pmc/articles/PMC3806770/ /pubmed/24194826 http://dx.doi.org/10.1371/journal.pone.0075748 Text en © 2013 Sirinukunwattana et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Sirinukunwattana, Korsuk
Savage, Richard S.
Bari, Muhammad F.
Snead, David R. J.
Rajpoot, Nasir M.
Bayesian Hierarchical Clustering for Studying Cancer Gene Expression Data with Unknown Statistics
title Bayesian Hierarchical Clustering for Studying Cancer Gene Expression Data with Unknown Statistics
title_full Bayesian Hierarchical Clustering for Studying Cancer Gene Expression Data with Unknown Statistics
title_fullStr Bayesian Hierarchical Clustering for Studying Cancer Gene Expression Data with Unknown Statistics
title_full_unstemmed Bayesian Hierarchical Clustering for Studying Cancer Gene Expression Data with Unknown Statistics
title_short Bayesian Hierarchical Clustering for Studying Cancer Gene Expression Data with Unknown Statistics
title_sort bayesian hierarchical clustering for studying cancer gene expression data with unknown statistics
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3806770/
https://www.ncbi.nlm.nih.gov/pubmed/24194826
http://dx.doi.org/10.1371/journal.pone.0075748
work_keys_str_mv AT sirinukunwattanakorsuk bayesianhierarchicalclusteringforstudyingcancergeneexpressiondatawithunknownstatistics
AT savagerichards bayesianhierarchicalclusteringforstudyingcancergeneexpressiondatawithunknownstatistics
AT barimuhammadf bayesianhierarchicalclusteringforstudyingcancergeneexpressiondatawithunknownstatistics
AT sneaddavidrj bayesianhierarchicalclusteringforstudyingcancergeneexpressiondatawithunknownstatistics
AT rajpootnasirm bayesianhierarchicalclusteringforstudyingcancergeneexpressiondatawithunknownstatistics