Cargando…

An evaluation of statistical methods for DNA methylation microarray data analysis

BACKGROUND: DNA methylation offers an excellent example for elucidating how epigenetic information affects gene expression. β values and M values are commonly used to quantify DNA methylation. Statistical methods applicable to DNA methylation data analysis span a number of approaches such as Wilcoxo...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Dongmei, Xie, Zidian, Le Pape, Marc, Dye, Timothy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4497424/
https://www.ncbi.nlm.nih.gov/pubmed/26156501
http://dx.doi.org/10.1186/s12859-015-0641-x
_version_ 1782380512531185664
author Li, Dongmei
Xie, Zidian
Le Pape, Marc
Dye, Timothy
author_facet Li, Dongmei
Xie, Zidian
Le Pape, Marc
Dye, Timothy
author_sort Li, Dongmei
collection PubMed
description BACKGROUND: DNA methylation offers an excellent example for elucidating how epigenetic information affects gene expression. β values and M values are commonly used to quantify DNA methylation. Statistical methods applicable to DNA methylation data analysis span a number of approaches such as Wilcoxon rank sum test, t-test, Kolmogorov–Smirnov test, permutation test, empirical Bayes method, and bump hunting method. Nonetheless, selection of an optimal statistical method can be challenging when different methods generate inconsistent results from the same data set. RESULTS: We compared six statistical approaches relevant to DNA methylation microarray analysis in terms of false discovery rate control, statistical power, and stability through simulation studies and real data examples. Observable differences were noticed between β values and M values only when methylation levels were correlated across CpG loci. For small sample size (n=3 or 6 in each group), both the empirical Bayes and bump hunting methods showed appropriate FDR control and the highest power when methylation levels across CpG loci were independent. Only the bump hunting method showed appropriate FDR control and the highest power when methylation levels across CpG sites were correlated. For medium (n=12 in each group) and large sample sizes (n=24 in each group), all methods compared had similar power, except for the permutation test whenever the proportion of differentially methylated loci was low. For all sample sizes, the bump hunting method had the lowest stability in terms of standard deviation of total discoveries whenever the proportion of differentially methylated loci was large. The apparent test power comparisons based on raw p-values from DNA methylation studies on ovarian cancer and rheumatoid arthritis provided results as consistent as those obtained in the simulation studies. Overall, these results provide guidance for optimal statistical methods selection under different scenarios. CONCLUSIONS: For DNA methylation studies with small sample size, the bump hunting method and the empirical Bayes method are recommended when DNA methylation levels across CpG loci are independent, while only the bump hunting method is recommended when DNA methylation levels are correlated across CpG loci. All methods are acceptable for medium or large sample sizes.
format Online
Article
Text
id pubmed-4497424
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44974242015-07-10 An evaluation of statistical methods for DNA methylation microarray data analysis Li, Dongmei Xie, Zidian Le Pape, Marc Dye, Timothy BMC Bioinformatics Research Article BACKGROUND: DNA methylation offers an excellent example for elucidating how epigenetic information affects gene expression. β values and M values are commonly used to quantify DNA methylation. Statistical methods applicable to DNA methylation data analysis span a number of approaches such as Wilcoxon rank sum test, t-test, Kolmogorov–Smirnov test, permutation test, empirical Bayes method, and bump hunting method. Nonetheless, selection of an optimal statistical method can be challenging when different methods generate inconsistent results from the same data set. RESULTS: We compared six statistical approaches relevant to DNA methylation microarray analysis in terms of false discovery rate control, statistical power, and stability through simulation studies and real data examples. Observable differences were noticed between β values and M values only when methylation levels were correlated across CpG loci. For small sample size (n=3 or 6 in each group), both the empirical Bayes and bump hunting methods showed appropriate FDR control and the highest power when methylation levels across CpG loci were independent. Only the bump hunting method showed appropriate FDR control and the highest power when methylation levels across CpG sites were correlated. For medium (n=12 in each group) and large sample sizes (n=24 in each group), all methods compared had similar power, except for the permutation test whenever the proportion of differentially methylated loci was low. For all sample sizes, the bump hunting method had the lowest stability in terms of standard deviation of total discoveries whenever the proportion of differentially methylated loci was large. The apparent test power comparisons based on raw p-values from DNA methylation studies on ovarian cancer and rheumatoid arthritis provided results as consistent as those obtained in the simulation studies. Overall, these results provide guidance for optimal statistical methods selection under different scenarios. CONCLUSIONS: For DNA methylation studies with small sample size, the bump hunting method and the empirical Bayes method are recommended when DNA methylation levels across CpG loci are independent, while only the bump hunting method is recommended when DNA methylation levels are correlated across CpG loci. All methods are acceptable for medium or large sample sizes. BioMed Central 2015-07-10 /pmc/articles/PMC4497424/ /pubmed/26156501 http://dx.doi.org/10.1186/s12859-015-0641-x Text en © Li et al. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Li, Dongmei
Xie, Zidian
Le Pape, Marc
Dye, Timothy
An evaluation of statistical methods for DNA methylation microarray data analysis
title An evaluation of statistical methods for DNA methylation microarray data analysis
title_full An evaluation of statistical methods for DNA methylation microarray data analysis
title_fullStr An evaluation of statistical methods for DNA methylation microarray data analysis
title_full_unstemmed An evaluation of statistical methods for DNA methylation microarray data analysis
title_short An evaluation of statistical methods for DNA methylation microarray data analysis
title_sort evaluation of statistical methods for dna methylation microarray data analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4497424/
https://www.ncbi.nlm.nih.gov/pubmed/26156501
http://dx.doi.org/10.1186/s12859-015-0641-x
work_keys_str_mv AT lidongmei anevaluationofstatisticalmethodsfordnamethylationmicroarraydataanalysis
AT xiezidian anevaluationofstatisticalmethodsfordnamethylationmicroarraydataanalysis
AT lepapemarc anevaluationofstatisticalmethodsfordnamethylationmicroarraydataanalysis
AT dyetimothy anevaluationofstatisticalmethodsfordnamethylationmicroarraydataanalysis
AT lidongmei evaluationofstatisticalmethodsfordnamethylationmicroarraydataanalysis
AT xiezidian evaluationofstatisticalmethodsfordnamethylationmicroarraydataanalysis
AT lepapemarc evaluationofstatisticalmethodsfordnamethylationmicroarraydataanalysis
AT dyetimothy evaluationofstatisticalmethodsfordnamethylationmicroarraydataanalysis