Cargando…

Estimands in epigenome-wide association studies

BACKGROUND: In DNA methylation analyses like epigenome-wide association studies, effects in differentially methylated CpG sites are assessed. Two kinds of outcomes can be used for statistical analysis: Beta-values and M-values. M-values follow a normal distribution and help to detect differentially...

Descripción completa

Detalles Bibliográficos
Autores principales: Kruppa, Jochen, Sieg, Miriam, Richter, Gesa, Pohrt, Anne
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8086103/
https://www.ncbi.nlm.nih.gov/pubmed/33926513
http://dx.doi.org/10.1186/s13148-021-01083-9
_version_ 1783686458940325888
author Kruppa, Jochen
Sieg, Miriam
Richter, Gesa
Pohrt, Anne
author_facet Kruppa, Jochen
Sieg, Miriam
Richter, Gesa
Pohrt, Anne
author_sort Kruppa, Jochen
collection PubMed
description BACKGROUND: In DNA methylation analyses like epigenome-wide association studies, effects in differentially methylated CpG sites are assessed. Two kinds of outcomes can be used for statistical analysis: Beta-values and M-values. M-values follow a normal distribution and help to detect differentially methylated CpG sites. As biological effect measures, differences of M-values are more or less meaningless. Beta-values are of more interest since they can be interpreted directly as differences in percentage of DNA methylation at a given CpG site, but they have poor statistical properties. Different frameworks are proposed for reporting estimands in DNA methylation analysis, relying on Beta-values, M-values, or both. RESULTS: We present and discuss four possible approaches of achieving estimands in DNA methylation analysis. In addition, we present the usage of M-values or Beta-values in the context of bioinformatical pipelines, which often demand a predefined outcome. We show the dependencies between the differences in M-values to differences in Beta-values in two data simulations: a analysis with and without confounder effect. Without present confounder effects, M-values can be used for the statistical analysis and Beta-values statistics for the reporting. If confounder effects exist, we demonstrate the deviations and correct the effects by the intercept method. Finally, we demonstrate the theoretical problem on two large human genome-wide DNA methylation datasets to verify the results. CONCLUSIONS: The usage of M-values in the analysis of DNA methylation data will produce effect estimates, which cannot be biologically interpreted. The parallel usage of Beta-value statistics ignores possible confounder effects and can therefore not be recommended. Hence, if the differences in Beta-values are the focus of the study, the intercept method is recommendable. Hyper- or hypomethylated CpG sites must then be carefully evaluated. If an exploratory analysis of possible CpG sites is the aim of the study, M-values can be used for inference. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13148-021-01083-9.
format Online
Article
Text
id pubmed-8086103
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-80861032021-04-30 Estimands in epigenome-wide association studies Kruppa, Jochen Sieg, Miriam Richter, Gesa Pohrt, Anne Clin Epigenetics Methodology BACKGROUND: In DNA methylation analyses like epigenome-wide association studies, effects in differentially methylated CpG sites are assessed. Two kinds of outcomes can be used for statistical analysis: Beta-values and M-values. M-values follow a normal distribution and help to detect differentially methylated CpG sites. As biological effect measures, differences of M-values are more or less meaningless. Beta-values are of more interest since they can be interpreted directly as differences in percentage of DNA methylation at a given CpG site, but they have poor statistical properties. Different frameworks are proposed for reporting estimands in DNA methylation analysis, relying on Beta-values, M-values, or both. RESULTS: We present and discuss four possible approaches of achieving estimands in DNA methylation analysis. In addition, we present the usage of M-values or Beta-values in the context of bioinformatical pipelines, which often demand a predefined outcome. We show the dependencies between the differences in M-values to differences in Beta-values in two data simulations: a analysis with and without confounder effect. Without present confounder effects, M-values can be used for the statistical analysis and Beta-values statistics for the reporting. If confounder effects exist, we demonstrate the deviations and correct the effects by the intercept method. Finally, we demonstrate the theoretical problem on two large human genome-wide DNA methylation datasets to verify the results. CONCLUSIONS: The usage of M-values in the analysis of DNA methylation data will produce effect estimates, which cannot be biologically interpreted. The parallel usage of Beta-value statistics ignores possible confounder effects and can therefore not be recommended. Hence, if the differences in Beta-values are the focus of the study, the intercept method is recommendable. Hyper- or hypomethylated CpG sites must then be carefully evaluated. If an exploratory analysis of possible CpG sites is the aim of the study, M-values can be used for inference. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13148-021-01083-9. BioMed Central 2021-04-29 /pmc/articles/PMC8086103/ /pubmed/33926513 http://dx.doi.org/10.1186/s13148-021-01083-9 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology
Kruppa, Jochen
Sieg, Miriam
Richter, Gesa
Pohrt, Anne
Estimands in epigenome-wide association studies
title Estimands in epigenome-wide association studies
title_full Estimands in epigenome-wide association studies
title_fullStr Estimands in epigenome-wide association studies
title_full_unstemmed Estimands in epigenome-wide association studies
title_short Estimands in epigenome-wide association studies
title_sort estimands in epigenome-wide association studies
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8086103/
https://www.ncbi.nlm.nih.gov/pubmed/33926513
http://dx.doi.org/10.1186/s13148-021-01083-9
work_keys_str_mv AT kruppajochen estimandsinepigenomewideassociationstudies
AT siegmiriam estimandsinepigenomewideassociationstudies
AT richtergesa estimandsinepigenomewideassociationstudies
AT pohrtanne estimandsinepigenomewideassociationstudies