Cargando…

Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions

BACKGROUND: Epigenetics is the study of heritable changes in gene function that cannot be explained by changes in DNA sequence. One of the most commonly studied epigenetic alterations is cytosine methylation, which is a well recognized mechanism of epigenetic gene silencing and often occurs at tumor...

Descripción completa

Detalles Bibliográficos
Autores principales:	Houseman, E Andres, Christensen, Brock C, Yeh, Ru-Fang, Marsit, Carmen J, Karagas, Margaret R, Wrensch, Margaret, Nelson, Heather H, Wiemels, Joseph, Zheng, Shichun, Wiencke, John K, Kelsey, Karl T
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2553421/ https://www.ncbi.nlm.nih.gov/pubmed/18782434 http://dx.doi.org/10.1186/1471-2105-9-365

_version_	1782159504404643840
author	Houseman, E Andres Christensen, Brock C Yeh, Ru-Fang Marsit, Carmen J Karagas, Margaret R Wrensch, Margaret Nelson, Heather H Wiemels, Joseph Zheng, Shichun Wiencke, John K Kelsey, Karl T
author_facet	Houseman, E Andres Christensen, Brock C Yeh, Ru-Fang Marsit, Carmen J Karagas, Margaret R Wrensch, Margaret Nelson, Heather H Wiemels, Joseph Zheng, Shichun Wiencke, John K Kelsey, Karl T
author_sort	Houseman, E Andres
collection	PubMed
description	BACKGROUND: Epigenetics is the study of heritable changes in gene function that cannot be explained by changes in DNA sequence. One of the most commonly studied epigenetic alterations is cytosine methylation, which is a well recognized mechanism of epigenetic gene silencing and often occurs at tumor suppressor gene loci in human cancer. Arrays are now being used to study DNA methylation at a large number of loci; for example, the Illumina GoldenGate platform assesses DNA methylation at 1505 loci associated with over 800 cancer-related genes. Model-based cluster analysis is often used to identify DNA methylation subgroups in data, but it is unclear how to cluster DNA methylation data from arrays in a scalable and reliable manner. RESULTS: We propose a novel model-based recursive-partitioning algorithm to navigate clusters in a beta mixture model. We present simulations that show that the method is more reliable than competing nonparametric clustering approaches, and is at least as reliable as conventional mixture model methods. We also show that our proposed method is more computationally efficient than conventional mixture model approaches. We demonstrate our method on the normal tissue samples and show that the clusters are associated with tissue type as well as age. CONCLUSION: Our proposed recursively-partitioned mixture model is an effective and computationally efficient method for clustering DNA methylation data.
format	Text
id	pubmed-2553421
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-25534212008-09-26 Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions Houseman, E Andres Christensen, Brock C Yeh, Ru-Fang Marsit, Carmen J Karagas, Margaret R Wrensch, Margaret Nelson, Heather H Wiemels, Joseph Zheng, Shichun Wiencke, John K Kelsey, Karl T BMC Bioinformatics Methodology Article BACKGROUND: Epigenetics is the study of heritable changes in gene function that cannot be explained by changes in DNA sequence. One of the most commonly studied epigenetic alterations is cytosine methylation, which is a well recognized mechanism of epigenetic gene silencing and often occurs at tumor suppressor gene loci in human cancer. Arrays are now being used to study DNA methylation at a large number of loci; for example, the Illumina GoldenGate platform assesses DNA methylation at 1505 loci associated with over 800 cancer-related genes. Model-based cluster analysis is often used to identify DNA methylation subgroups in data, but it is unclear how to cluster DNA methylation data from arrays in a scalable and reliable manner. RESULTS: We propose a novel model-based recursive-partitioning algorithm to navigate clusters in a beta mixture model. We present simulations that show that the method is more reliable than competing nonparametric clustering approaches, and is at least as reliable as conventional mixture model methods. We also show that our proposed method is more computationally efficient than conventional mixture model approaches. We demonstrate our method on the normal tissue samples and show that the clusters are associated with tissue type as well as age. CONCLUSION: Our proposed recursively-partitioned mixture model is an effective and computationally efficient method for clustering DNA methylation data. BioMed Central 2008-09-09 /pmc/articles/PMC2553421/ /pubmed/18782434 http://dx.doi.org/10.1186/1471-2105-9-365 Text en Copyright © 2008 Houseman et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Houseman, E Andres Christensen, Brock C Yeh, Ru-Fang Marsit, Carmen J Karagas, Margaret R Wrensch, Margaret Nelson, Heather H Wiemels, Joseph Zheng, Shichun Wiencke, John K Kelsey, Karl T Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions
title	Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions
title_full	Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions
title_fullStr	Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions
title_full_unstemmed	Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions
title_short	Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions
title_sort	model-based clustering of dna methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2553421/ https://www.ncbi.nlm.nih.gov/pubmed/18782434 http://dx.doi.org/10.1186/1471-2105-9-365
work_keys_str_mv	AT housemaneandres modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions AT christensenbrockc modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions AT yehrufang modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions AT marsitcarmenj modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions AT karagasmargaretr modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions AT wrenschmargaret modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions AT nelsonheatherh modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions AT wiemelsjoseph modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions AT zhengshichun modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions AT wienckejohnk modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions AT kelseykarlt modelbasedclusteringofdnamethylationarraydataarecursivepartitioningalgorithmforhighdimensionaldataarisingasamixtureofbetadistributions

Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions

Ejemplares similares