Cargando…

On the potential of models for location and scale for genome-wide DNA methylation data

BACKGROUND: With the help of epigenome-wide association studies (EWAS), increasing knowledge on the role of epigenetic mechanisms such as DNA methylation in disease processes is obtained. In addition, EWAS aid the understanding of behavioral and environmental effects on DNA methylation. In terms of...

Descripción completa

Detalles Bibliográficos
Autores principales: Wahl, Simone, Fenske, Nora, Zeilinger, Sonja, Suhre, Karsten, Gieger, Christian, Waldenberger, Melanie, Grallert, Harald, Schmid, Matthias
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4227139/
https://www.ncbi.nlm.nih.gov/pubmed/24994026
http://dx.doi.org/10.1186/1471-2105-15-232
_version_ 1782343745138589696
author Wahl, Simone
Fenske, Nora
Zeilinger, Sonja
Suhre, Karsten
Gieger, Christian
Waldenberger, Melanie
Grallert, Harald
Schmid, Matthias
author_facet Wahl, Simone
Fenske, Nora
Zeilinger, Sonja
Suhre, Karsten
Gieger, Christian
Waldenberger, Melanie
Grallert, Harald
Schmid, Matthias
author_sort Wahl, Simone
collection PubMed
description BACKGROUND: With the help of epigenome-wide association studies (EWAS), increasing knowledge on the role of epigenetic mechanisms such as DNA methylation in disease processes is obtained. In addition, EWAS aid the understanding of behavioral and environmental effects on DNA methylation. In terms of statistical analysis, specific challenges arise from the characteristics of methylation data. First, methylation β-values represent proportions with skewed and heteroscedastic distributions. Thus, traditional modeling strategies assuming a normally distributed response might not be appropriate. Second, recent evidence suggests that not only mean differences but also variability in site-specific DNA methylation associates with diseases, including cancer. The purpose of this study was to compare different modeling strategies for methylation data in terms of model performance and performance of downstream hypothesis tests. Specifically, we used the generalized additive models for location, scale and shape (GAMLSS) framework to compare beta regression with Gaussian regression on raw, binary logit and arcsine square root transformed methylation data, with and without modeling a covariate effect on the scale parameter. RESULTS: Using simulated and real data from a large population-based study and an independent sample of cancer patients and healthy controls, we show that beta regression does not outperform competing strategies in terms of model performance. In addition, Gaussian models for location and scale showed an improved performance as compared to models for location only. The best performance was observed for the Gaussian model on binary logit transformed β-values, referred to as M-values. Our results further suggest that models for location and scale are specifically sensitive towards violations of the distribution assumption and towards outliers in the methylation data. Therefore, a resampling procedure is proposed as a mode of inference and shown to diminish type I error rate in practically relevant settings. We apply the proposed method in an EWAS of BMI and age and reveal strong associations of age with methylation variability that are validated in an independent sample. CONCLUSIONS: Models for location and scale are promising tools for EWAS that may help to understand the influence of environmental factors and disease-related phenotypes on methylation variability and its role during disease development.
format Online
Article
Text
id pubmed-4227139
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42271392014-11-12 On the potential of models for location and scale for genome-wide DNA methylation data Wahl, Simone Fenske, Nora Zeilinger, Sonja Suhre, Karsten Gieger, Christian Waldenberger, Melanie Grallert, Harald Schmid, Matthias BMC Bioinformatics Research Article BACKGROUND: With the help of epigenome-wide association studies (EWAS), increasing knowledge on the role of epigenetic mechanisms such as DNA methylation in disease processes is obtained. In addition, EWAS aid the understanding of behavioral and environmental effects on DNA methylation. In terms of statistical analysis, specific challenges arise from the characteristics of methylation data. First, methylation β-values represent proportions with skewed and heteroscedastic distributions. Thus, traditional modeling strategies assuming a normally distributed response might not be appropriate. Second, recent evidence suggests that not only mean differences but also variability in site-specific DNA methylation associates with diseases, including cancer. The purpose of this study was to compare different modeling strategies for methylation data in terms of model performance and performance of downstream hypothesis tests. Specifically, we used the generalized additive models for location, scale and shape (GAMLSS) framework to compare beta regression with Gaussian regression on raw, binary logit and arcsine square root transformed methylation data, with and without modeling a covariate effect on the scale parameter. RESULTS: Using simulated and real data from a large population-based study and an independent sample of cancer patients and healthy controls, we show that beta regression does not outperform competing strategies in terms of model performance. In addition, Gaussian models for location and scale showed an improved performance as compared to models for location only. The best performance was observed for the Gaussian model on binary logit transformed β-values, referred to as M-values. Our results further suggest that models for location and scale are specifically sensitive towards violations of the distribution assumption and towards outliers in the methylation data. Therefore, a resampling procedure is proposed as a mode of inference and shown to diminish type I error rate in practically relevant settings. We apply the proposed method in an EWAS of BMI and age and reveal strong associations of age with methylation variability that are validated in an independent sample. CONCLUSIONS: Models for location and scale are promising tools for EWAS that may help to understand the influence of environmental factors and disease-related phenotypes on methylation variability and its role during disease development. BioMed Central 2014-07-03 /pmc/articles/PMC4227139/ /pubmed/24994026 http://dx.doi.org/10.1186/1471-2105-15-232 Text en Copyright © 2014 Wahl et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Wahl, Simone
Fenske, Nora
Zeilinger, Sonja
Suhre, Karsten
Gieger, Christian
Waldenberger, Melanie
Grallert, Harald
Schmid, Matthias
On the potential of models for location and scale for genome-wide DNA methylation data
title On the potential of models for location and scale for genome-wide DNA methylation data
title_full On the potential of models for location and scale for genome-wide DNA methylation data
title_fullStr On the potential of models for location and scale for genome-wide DNA methylation data
title_full_unstemmed On the potential of models for location and scale for genome-wide DNA methylation data
title_short On the potential of models for location and scale for genome-wide DNA methylation data
title_sort on the potential of models for location and scale for genome-wide dna methylation data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4227139/
https://www.ncbi.nlm.nih.gov/pubmed/24994026
http://dx.doi.org/10.1186/1471-2105-15-232
work_keys_str_mv AT wahlsimone onthepotentialofmodelsforlocationandscaleforgenomewidednamethylationdata
AT fenskenora onthepotentialofmodelsforlocationandscaleforgenomewidednamethylationdata
AT zeilingersonja onthepotentialofmodelsforlocationandscaleforgenomewidednamethylationdata
AT suhrekarsten onthepotentialofmodelsforlocationandscaleforgenomewidednamethylationdata
AT giegerchristian onthepotentialofmodelsforlocationandscaleforgenomewidednamethylationdata
AT waldenbergermelanie onthepotentialofmodelsforlocationandscaleforgenomewidednamethylationdata
AT grallertharald onthepotentialofmodelsforlocationandscaleforgenomewidednamethylationdata
AT schmidmatthias onthepotentialofmodelsforlocationandscaleforgenomewidednamethylationdata