Cargando…

A nonparametric Bayesian approach for clustering bisulfate-based DNA methylation profiles

BACKGROUND: DNA methylation occurs in the context of a CpG dinucleotide. It is an important epigenetic modification, which can be inherited through cell division. The two major types of methylation include hypomethylation and hypermethylation. Unique methylation patterns have been shown to exist in...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Lin, Meng, Jia, Liu, Hui, Huang, Yufei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3481479/
https://www.ncbi.nlm.nih.gov/pubmed/23134689
http://dx.doi.org/10.1186/1471-2164-13-S6-S20
_version_ 1782247748212359168
author Zhang, Lin
Meng, Jia
Liu, Hui
Huang, Yufei
author_facet Zhang, Lin
Meng, Jia
Liu, Hui
Huang, Yufei
author_sort Zhang, Lin
collection PubMed
description BACKGROUND: DNA methylation occurs in the context of a CpG dinucleotide. It is an important epigenetic modification, which can be inherited through cell division. The two major types of methylation include hypomethylation and hypermethylation. Unique methylation patterns have been shown to exist in diseases including various types of cancer. DNA methylation analysis promises to become a powerful tool in cancer diagnosis, treatment and prognostication. Large-scale methylation arrays are now available for studying methylation genome-wide. The Illumina methylation platform simultaneously measures cytosine methylation at more than 1500 CpG sites associated with over 800 cancer-related genes. Cluster analysis is often used to identify DNA methylation subgroups for prognosis and diagnosis. However, due to the unique non-Gaussian characteristics, traditional clustering methods may not be appropriate for DNA and methylation data, and the determination of optimal cluster number is still problematic. METHOD: A Dirichlet process beta mixture model (DPBMM) is proposed that models the DNA methylation expressions as an infinite number of beta mixture distribution. The model allows automatic learning of the relevant parameters such as the cluster mixing proportion, the parameters of beta distribution for each cluster, and especially the number of potential clusters. Since the model is high dimensional and analytically intractable, we proposed a Gibbs sampling "no-gaps" solution for computing the posterior distributions, hence the estimates of the parameters. RESULT: The proposed algorithm was tested on simulated data as well as methylation data from 55 Glioblastoma multiform (GBM) brain tissue samples. To reduce the computational burden due to the high data dimensionality, a dimension reduction method is adopted. The two GBM clusters yielded by DPBMM are based on data of different number of loci (P-value < 0.1), while hierarchical clustering cannot yield statistically significant clusters.
format Online
Article
Text
id pubmed-3481479
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34814792012-11-02 A nonparametric Bayesian approach for clustering bisulfate-based DNA methylation profiles Zhang, Lin Meng, Jia Liu, Hui Huang, Yufei BMC Genomics Research BACKGROUND: DNA methylation occurs in the context of a CpG dinucleotide. It is an important epigenetic modification, which can be inherited through cell division. The two major types of methylation include hypomethylation and hypermethylation. Unique methylation patterns have been shown to exist in diseases including various types of cancer. DNA methylation analysis promises to become a powerful tool in cancer diagnosis, treatment and prognostication. Large-scale methylation arrays are now available for studying methylation genome-wide. The Illumina methylation platform simultaneously measures cytosine methylation at more than 1500 CpG sites associated with over 800 cancer-related genes. Cluster analysis is often used to identify DNA methylation subgroups for prognosis and diagnosis. However, due to the unique non-Gaussian characteristics, traditional clustering methods may not be appropriate for DNA and methylation data, and the determination of optimal cluster number is still problematic. METHOD: A Dirichlet process beta mixture model (DPBMM) is proposed that models the DNA methylation expressions as an infinite number of beta mixture distribution. The model allows automatic learning of the relevant parameters such as the cluster mixing proportion, the parameters of beta distribution for each cluster, and especially the number of potential clusters. Since the model is high dimensional and analytically intractable, we proposed a Gibbs sampling "no-gaps" solution for computing the posterior distributions, hence the estimates of the parameters. RESULT: The proposed algorithm was tested on simulated data as well as methylation data from 55 Glioblastoma multiform (GBM) brain tissue samples. To reduce the computational burden due to the high data dimensionality, a dimension reduction method is adopted. The two GBM clusters yielded by DPBMM are based on data of different number of loci (P-value < 0.1), while hierarchical clustering cannot yield statistically significant clusters. BioMed Central 2012-10-26 /pmc/articles/PMC3481479/ /pubmed/23134689 http://dx.doi.org/10.1186/1471-2164-13-S6-S20 Text en Copyright ©2012 Zhang et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Zhang, Lin
Meng, Jia
Liu, Hui
Huang, Yufei
A nonparametric Bayesian approach for clustering bisulfate-based DNA methylation profiles
title A nonparametric Bayesian approach for clustering bisulfate-based DNA methylation profiles
title_full A nonparametric Bayesian approach for clustering bisulfate-based DNA methylation profiles
title_fullStr A nonparametric Bayesian approach for clustering bisulfate-based DNA methylation profiles
title_full_unstemmed A nonparametric Bayesian approach for clustering bisulfate-based DNA methylation profiles
title_short A nonparametric Bayesian approach for clustering bisulfate-based DNA methylation profiles
title_sort nonparametric bayesian approach for clustering bisulfate-based dna methylation profiles
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3481479/
https://www.ncbi.nlm.nih.gov/pubmed/23134689
http://dx.doi.org/10.1186/1471-2164-13-S6-S20
work_keys_str_mv AT zhanglin anonparametricbayesianapproachforclusteringbisulfatebaseddnamethylationprofiles
AT mengjia anonparametricbayesianapproachforclusteringbisulfatebaseddnamethylationprofiles
AT liuhui anonparametricbayesianapproachforclusteringbisulfatebaseddnamethylationprofiles
AT huangyufei anonparametricbayesianapproachforclusteringbisulfatebaseddnamethylationprofiles
AT zhanglin nonparametricbayesianapproachforclusteringbisulfatebaseddnamethylationprofiles
AT mengjia nonparametricbayesianapproachforclusteringbisulfatebaseddnamethylationprofiles
AT liuhui nonparametricbayesianapproachforclusteringbisulfatebaseddnamethylationprofiles
AT huangyufei nonparametricbayesianapproachforclusteringbisulfatebaseddnamethylationprofiles