Cargando…

Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions

BACKGROUND: Epigenetics is the study of heritable changes in gene function that cannot be explained by changes in DNA sequence. One of the most commonly studied epigenetic alterations is cytosine methylation, which is a well recognized mechanism of epigenetic gene silencing and often occurs at tumor...

Descripción completa

Detalles Bibliográficos
Autores principales: Houseman, E Andres, Christensen, Brock C, Yeh, Ru-Fang, Marsit, Carmen J, Karagas, Margaret R, Wrensch, Margaret, Nelson, Heather H, Wiemels, Joseph, Zheng, Shichun, Wiencke, John K, Kelsey, Karl T
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2553421/
https://www.ncbi.nlm.nih.gov/pubmed/18782434
http://dx.doi.org/10.1186/1471-2105-9-365
_version_ 1782159504404643840
author Houseman, E Andres
Christensen, Brock C
Yeh, Ru-Fang
Marsit, Carmen J
Karagas, Margaret R
Wrensch, Margaret
Nelson, Heather H
Wiemels, Joseph
Zheng, Shichun
Wiencke, John K
Kelsey, Karl T
author_facet Houseman, E Andres
Christensen, Brock C
Yeh, Ru-Fang
Marsit, Carmen J
Karagas, Margaret R
Wrensch, Margaret
Nelson, Heather H
Wiemels, Joseph
Zheng, Shichun
Wiencke, John K
Kelsey, Karl T
author_sort Houseman, E Andres
collection PubMed
description BACKGROUND: Epigenetics is the study of heritable changes in gene function that cannot be explained by changes in DNA sequence. One of the most commonly studied epigenetic alterations is cytosine methylation, which is a well recognized mechanism of epigenetic gene silencing and often occurs at tumor suppressor gene loci in human cancer. Arrays are now being used to study DNA methylation at a large number of loci; for example, the Illumina GoldenGate platform assesses DNA methylation at 1505 loci associated with over 800 cancer-related genes. Model-based cluster analysis is often used to identify DNA methylation subgroups in data, but it is unclear how to cluster DNA methylation data from arrays in a scalable and reliable manner. RESULTS: We propose a novel model-based recursive-partitioning algorithm to navigate clusters in a beta mixture model. We present simulations that show that the method is more reliable than competing nonparametric clustering approaches, and is at least as reliable as conventional mixture model methods. We also show that our proposed method is more computationally efficient than conventional mixture model approaches. We demonstrate our method on the normal tissue samples and show that the clusters are associated with tissue type as well as age. CONCLUSION: Our proposed recursively-partitioned mixture model is an effective and computationally efficient method for clustering DNA methylation data.
format Text
id pubmed-2553421
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-25534212008-09-26 Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions Houseman, E Andres Christensen, Brock C Yeh, Ru-Fang Marsit, Carmen J Karagas, Margaret R Wrensch, Margaret Nelson, Heather H Wiemels, Joseph Zheng, Shichun Wiencke, John K Kelsey, Karl T BMC Bioinformatics Methodology Article BACKGROUND: Epigenetics is the study of heritable changes in gene function that cannot be explained by changes in DNA sequence. One of the most commonly studied epigenetic alterations is cytosine methylation, which is a well recognized mechanism of epigenetic gene silencing and often occurs at tumor suppressor gene loci in human cancer. Arrays are now being used to study DNA methylation at a large number of loci; for example, the Illumina GoldenGate platform assesses DNA methylation at 1505 loci associated with over 800 cancer-related genes. Model-based cluster analysis is often used to identify DNA methylation subgroups in data, but it is unclear how to cluster DNA methylation data from arrays in a scalable and reliable manner. RESULTS: We propose a novel model-based recursive-partitioning algorithm to navigate clusters in a beta mixture model. We present simulations that show that the method is more reliable than competing nonparametric clustering approaches, and is at least as reliable as conventional mixture model methods. We also show that our proposed method is more computationally efficient than conventional mixture model approaches. We demonstrate our method on the normal tissue samples and show that the clusters are associated with tissue type as well as age. CONCLUSION: Our proposed recursively-partitioned mixture model is an effective and computationally efficient method for clustering DNA methylation data. BioMed Central 2008-09-09 /pmc/articles/PMC2553421/ /pubmed/18782434 http://dx.doi.org/10.1186/1471-2105-9-365 Text en Copyright © 2008 Houseman et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Houseman, E Andres
Christensen, Brock C
Yeh, Ru-Fang
Marsit, Carmen J
Karagas, Margaret R
Wrensch, Margaret
Nelson, Heather H
Wiemels, Joseph
Zheng, Shichun
Wiencke, John K
Kelsey, Karl T
Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions
title Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions
title_full Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions
title_fullStr Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions
title_full_unstemmed Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions
title_short Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions
title_sort model-based clustering of dna methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2553421/
https://www.ncbi.nlm.nih.gov/pubmed/18782434
http://dx.doi.org/10.1186/1471-2105-9-365
work_keys_str_mv AT housemaneandres modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions
AT christensenbrockc modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions
AT yehrufang modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions
AT marsitcarmenj modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions
AT karagasmargaretr modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions
AT wrenschmargaret modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions
AT nelsonheatherh modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions
AT wiemelsjoseph modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions
AT zhengshichun modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions
AT wienckejohnk modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions
AT kelseykarlt modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions