Cargando…
Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions
BACKGROUND: Epigenetics is the study of heritable changes in gene function that cannot be explained by changes in DNA sequence. One of the most commonly studied epigenetic alterations is cytosine methylation, which is a well recognized mechanism of epigenetic gene silencing and often occurs at tumor...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2553421/ https://www.ncbi.nlm.nih.gov/pubmed/18782434 http://dx.doi.org/10.1186/1471-2105-9-365 |
_version_ | 1782159504404643840 |
---|---|
author | Houseman, E Andres Christensen, Brock C Yeh, Ru-Fang Marsit, Carmen J Karagas, Margaret R Wrensch, Margaret Nelson, Heather H Wiemels, Joseph Zheng, Shichun Wiencke, John K Kelsey, Karl T |
author_facet | Houseman, E Andres Christensen, Brock C Yeh, Ru-Fang Marsit, Carmen J Karagas, Margaret R Wrensch, Margaret Nelson, Heather H Wiemels, Joseph Zheng, Shichun Wiencke, John K Kelsey, Karl T |
author_sort | Houseman, E Andres |
collection | PubMed |
description | BACKGROUND: Epigenetics is the study of heritable changes in gene function that cannot be explained by changes in DNA sequence. One of the most commonly studied epigenetic alterations is cytosine methylation, which is a well recognized mechanism of epigenetic gene silencing and often occurs at tumor suppressor gene loci in human cancer. Arrays are now being used to study DNA methylation at a large number of loci; for example, the Illumina GoldenGate platform assesses DNA methylation at 1505 loci associated with over 800 cancer-related genes. Model-based cluster analysis is often used to identify DNA methylation subgroups in data, but it is unclear how to cluster DNA methylation data from arrays in a scalable and reliable manner. RESULTS: We propose a novel model-based recursive-partitioning algorithm to navigate clusters in a beta mixture model. We present simulations that show that the method is more reliable than competing nonparametric clustering approaches, and is at least as reliable as conventional mixture model methods. We also show that our proposed method is more computationally efficient than conventional mixture model approaches. We demonstrate our method on the normal tissue samples and show that the clusters are associated with tissue type as well as age. CONCLUSION: Our proposed recursively-partitioned mixture model is an effective and computationally efficient method for clustering DNA methylation data. |
format | Text |
id | pubmed-2553421 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-25534212008-09-26 Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions Houseman, E Andres Christensen, Brock C Yeh, Ru-Fang Marsit, Carmen J Karagas, Margaret R Wrensch, Margaret Nelson, Heather H Wiemels, Joseph Zheng, Shichun Wiencke, John K Kelsey, Karl T BMC Bioinformatics Methodology Article BACKGROUND: Epigenetics is the study of heritable changes in gene function that cannot be explained by changes in DNA sequence. One of the most commonly studied epigenetic alterations is cytosine methylation, which is a well recognized mechanism of epigenetic gene silencing and often occurs at tumor suppressor gene loci in human cancer. Arrays are now being used to study DNA methylation at a large number of loci; for example, the Illumina GoldenGate platform assesses DNA methylation at 1505 loci associated with over 800 cancer-related genes. Model-based cluster analysis is often used to identify DNA methylation subgroups in data, but it is unclear how to cluster DNA methylation data from arrays in a scalable and reliable manner. RESULTS: We propose a novel model-based recursive-partitioning algorithm to navigate clusters in a beta mixture model. We present simulations that show that the method is more reliable than competing nonparametric clustering approaches, and is at least as reliable as conventional mixture model methods. We also show that our proposed method is more computationally efficient than conventional mixture model approaches. We demonstrate our method on the normal tissue samples and show that the clusters are associated with tissue type as well as age. CONCLUSION: Our proposed recursively-partitioned mixture model is an effective and computationally efficient method for clustering DNA methylation data. BioMed Central 2008-09-09 /pmc/articles/PMC2553421/ /pubmed/18782434 http://dx.doi.org/10.1186/1471-2105-9-365 Text en Copyright © 2008 Houseman et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Houseman, E Andres Christensen, Brock C Yeh, Ru-Fang Marsit, Carmen J Karagas, Margaret R Wrensch, Margaret Nelson, Heather H Wiemels, Joseph Zheng, Shichun Wiencke, John K Kelsey, Karl T Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions |
title | Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions |
title_full | Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions |
title_fullStr | Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions |
title_full_unstemmed | Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions |
title_short | Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions |
title_sort | model-based clustering of dna methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2553421/ https://www.ncbi.nlm.nih.gov/pubmed/18782434 http://dx.doi.org/10.1186/1471-2105-9-365 |
work_keys_str_mv | AT housemaneandres modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions AT christensenbrockc modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions AT yehrufang modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions AT marsitcarmenj modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions AT karagasmargaretr modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions AT wrenschmargaret modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions AT nelsonheatherh modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions AT wiemelsjoseph modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions AT zhengshichun modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions AT wienckejohnk modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions AT kelseykarlt modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions |