Cargando…

Cluster analysis for DNA methylation profiles having a detection threshold

BACKGROUND: DNA methylation, a molecular feature used to investigate tumor heterogeneity, can be measured on many genomic regions using the MethyLight technology. Due to the combination of the underlying biology of DNA methylation and the MethyLight technology, the measurements, while being generate...

Descripción completa

Detalles Bibliográficos
Autores principales:	Marjoram, Paul, Chang, Jing, Laird, Peter W, Siegmund, Kimberly D
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1555616/ https://www.ncbi.nlm.nih.gov/pubmed/16872497 http://dx.doi.org/10.1186/1471-2105-7-361

_version_	1782129367356276736
author	Marjoram, Paul Chang, Jing Laird, Peter W Siegmund, Kimberly D
author_facet	Marjoram, Paul Chang, Jing Laird, Peter W Siegmund, Kimberly D
author_sort	Marjoram, Paul
collection	PubMed
description	BACKGROUND: DNA methylation, a molecular feature used to investigate tumor heterogeneity, can be measured on many genomic regions using the MethyLight technology. Due to the combination of the underlying biology of DNA methylation and the MethyLight technology, the measurements, while being generated on a continuous scale, have a large number of 0 values. This suggests that conventional clustering methodology may not perform well on this data. RESULTS: We compare performance of existing methodology (such as k-means) with two novel methods that explicitly allow for the preponderance of values at 0. We also consider how the ability to successfully cluster such data depends upon the number of informative genes for which methylation is measured and the correlation structure of the methylation values for those genes. We show that when data is collected for a sufficient number of genes, our models do improve clustering performance compared to methods, such as k-means, that do not explicitly respect the supposed biological realities of the situation. CONCLUSION: The performance of analysis methods depends upon how well the assumptions of those methods reflect the properties of the data being analyzed. Differing technologies will lead to data with differing properties, and should therefore be analyzed differently. Consequently, it is prudent to give thought to what the properties of the data are likely to be, and which analysis method might therefore be likely to best capture those properties.
format	Text
id	pubmed-1555616
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-15556162006-08-26 Cluster analysis for DNA methylation profiles having a detection threshold Marjoram, Paul Chang, Jing Laird, Peter W Siegmund, Kimberly D BMC Bioinformatics Methodology Article BACKGROUND: DNA methylation, a molecular feature used to investigate tumor heterogeneity, can be measured on many genomic regions using the MethyLight technology. Due to the combination of the underlying biology of DNA methylation and the MethyLight technology, the measurements, while being generated on a continuous scale, have a large number of 0 values. This suggests that conventional clustering methodology may not perform well on this data. RESULTS: We compare performance of existing methodology (such as k-means) with two novel methods that explicitly allow for the preponderance of values at 0. We also consider how the ability to successfully cluster such data depends upon the number of informative genes for which methylation is measured and the correlation structure of the methylation values for those genes. We show that when data is collected for a sufficient number of genes, our models do improve clustering performance compared to methods, such as k-means, that do not explicitly respect the supposed biological realities of the situation. CONCLUSION: The performance of analysis methods depends upon how well the assumptions of those methods reflect the properties of the data being analyzed. Differing technologies will lead to data with differing properties, and should therefore be analyzed differently. Consequently, it is prudent to give thought to what the properties of the data are likely to be, and which analysis method might therefore be likely to best capture those properties. BioMed Central 2006-07-26 /pmc/articles/PMC1555616/ /pubmed/16872497 http://dx.doi.org/10.1186/1471-2105-7-361 Text en Copyright © 2006 Marjoram et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Marjoram, Paul Chang, Jing Laird, Peter W Siegmund, Kimberly D Cluster analysis for DNA methylation profiles having a detection threshold
title	Cluster analysis for DNA methylation profiles having a detection threshold
title_full	Cluster analysis for DNA methylation profiles having a detection threshold
title_fullStr	Cluster analysis for DNA methylation profiles having a detection threshold
title_full_unstemmed	Cluster analysis for DNA methylation profiles having a detection threshold
title_short	Cluster analysis for DNA methylation profiles having a detection threshold
title_sort	cluster analysis for dna methylation profiles having a detection threshold
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1555616/ https://www.ncbi.nlm.nih.gov/pubmed/16872497 http://dx.doi.org/10.1186/1471-2105-7-361
work_keys_str_mv	AT marjorampaul clusteranalysisfordnamethylationprofileshavingadetectionthreshold AT changjing clusteranalysisfordnamethylationprofileshavingadetectionthreshold AT lairdpeterw clusteranalysisfordnamethylationprofileshavingadetectionthreshold AT siegmundkimberlyd clusteranalysisfordnamethylationprofileshavingadetectionthreshold

Cluster analysis for DNA methylation profiles having a detection threshold

Ejemplares similares