Cargando…
A nonparametric Bayesian approach for clustering bisulfate-based DNA methylation profiles
BACKGROUND: DNA methylation occurs in the context of a CpG dinucleotide. It is an important epigenetic modification, which can be inherited through cell division. The two major types of methylation include hypomethylation and hypermethylation. Unique methylation patterns have been shown to exist in...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3481479/ https://www.ncbi.nlm.nih.gov/pubmed/23134689 http://dx.doi.org/10.1186/1471-2164-13-S6-S20 |
_version_ | 1782247748212359168 |
---|---|
author | Zhang, Lin Meng, Jia Liu, Hui Huang, Yufei |
author_facet | Zhang, Lin Meng, Jia Liu, Hui Huang, Yufei |
author_sort | Zhang, Lin |
collection | PubMed |
description | BACKGROUND: DNA methylation occurs in the context of a CpG dinucleotide. It is an important epigenetic modification, which can be inherited through cell division. The two major types of methylation include hypomethylation and hypermethylation. Unique methylation patterns have been shown to exist in diseases including various types of cancer. DNA methylation analysis promises to become a powerful tool in cancer diagnosis, treatment and prognostication. Large-scale methylation arrays are now available for studying methylation genome-wide. The Illumina methylation platform simultaneously measures cytosine methylation at more than 1500 CpG sites associated with over 800 cancer-related genes. Cluster analysis is often used to identify DNA methylation subgroups for prognosis and diagnosis. However, due to the unique non-Gaussian characteristics, traditional clustering methods may not be appropriate for DNA and methylation data, and the determination of optimal cluster number is still problematic. METHOD: A Dirichlet process beta mixture model (DPBMM) is proposed that models the DNA methylation expressions as an infinite number of beta mixture distribution. The model allows automatic learning of the relevant parameters such as the cluster mixing proportion, the parameters of beta distribution for each cluster, and especially the number of potential clusters. Since the model is high dimensional and analytically intractable, we proposed a Gibbs sampling "no-gaps" solution for computing the posterior distributions, hence the estimates of the parameters. RESULT: The proposed algorithm was tested on simulated data as well as methylation data from 55 Glioblastoma multiform (GBM) brain tissue samples. To reduce the computational burden due to the high data dimensionality, a dimension reduction method is adopted. The two GBM clusters yielded by DPBMM are based on data of different number of loci (P-value < 0.1), while hierarchical clustering cannot yield statistically significant clusters. |
format | Online Article Text |
id | pubmed-3481479 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-34814792012-11-02 A nonparametric Bayesian approach for clustering bisulfate-based DNA methylation profiles Zhang, Lin Meng, Jia Liu, Hui Huang, Yufei BMC Genomics Research BACKGROUND: DNA methylation occurs in the context of a CpG dinucleotide. It is an important epigenetic modification, which can be inherited through cell division. The two major types of methylation include hypomethylation and hypermethylation. Unique methylation patterns have been shown to exist in diseases including various types of cancer. DNA methylation analysis promises to become a powerful tool in cancer diagnosis, treatment and prognostication. Large-scale methylation arrays are now available for studying methylation genome-wide. The Illumina methylation platform simultaneously measures cytosine methylation at more than 1500 CpG sites associated with over 800 cancer-related genes. Cluster analysis is often used to identify DNA methylation subgroups for prognosis and diagnosis. However, due to the unique non-Gaussian characteristics, traditional clustering methods may not be appropriate for DNA and methylation data, and the determination of optimal cluster number is still problematic. METHOD: A Dirichlet process beta mixture model (DPBMM) is proposed that models the DNA methylation expressions as an infinite number of beta mixture distribution. The model allows automatic learning of the relevant parameters such as the cluster mixing proportion, the parameters of beta distribution for each cluster, and especially the number of potential clusters. Since the model is high dimensional and analytically intractable, we proposed a Gibbs sampling "no-gaps" solution for computing the posterior distributions, hence the estimates of the parameters. RESULT: The proposed algorithm was tested on simulated data as well as methylation data from 55 Glioblastoma multiform (GBM) brain tissue samples. To reduce the computational burden due to the high data dimensionality, a dimension reduction method is adopted. The two GBM clusters yielded by DPBMM are based on data of different number of loci (P-value < 0.1), while hierarchical clustering cannot yield statistically significant clusters. BioMed Central 2012-10-26 /pmc/articles/PMC3481479/ /pubmed/23134689 http://dx.doi.org/10.1186/1471-2164-13-S6-S20 Text en Copyright ©2012 Zhang et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Zhang, Lin Meng, Jia Liu, Hui Huang, Yufei A nonparametric Bayesian approach for clustering bisulfate-based DNA methylation profiles |
title | A nonparametric Bayesian approach for clustering bisulfate-based DNA methylation profiles |
title_full | A nonparametric Bayesian approach for clustering bisulfate-based DNA methylation profiles |
title_fullStr | A nonparametric Bayesian approach for clustering bisulfate-based DNA methylation profiles |
title_full_unstemmed | A nonparametric Bayesian approach for clustering bisulfate-based DNA methylation profiles |
title_short | A nonparametric Bayesian approach for clustering bisulfate-based DNA methylation profiles |
title_sort | nonparametric bayesian approach for clustering bisulfate-based dna methylation profiles |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3481479/ https://www.ncbi.nlm.nih.gov/pubmed/23134689 http://dx.doi.org/10.1186/1471-2164-13-S6-S20 |
work_keys_str_mv | AT zhanglin anonparametricbayesianapproachforclusteringbisulfatebaseddnamethylationprofiles AT mengjia anonparametricbayesianapproachforclusteringbisulfatebaseddnamethylationprofiles AT liuhui anonparametricbayesianapproachforclusteringbisulfatebaseddnamethylationprofiles AT huangyufei anonparametricbayesianapproachforclusteringbisulfatebaseddnamethylationprofiles AT zhanglin nonparametricbayesianapproachforclusteringbisulfatebaseddnamethylationprofiles AT mengjia nonparametricbayesianapproachforclusteringbisulfatebaseddnamethylationprofiles AT liuhui nonparametricbayesianapproachforclusteringbisulfatebaseddnamethylationprofiles AT huangyufei nonparametricbayesianapproachforclusteringbisulfatebaseddnamethylationprofiles |