Cargando…
Estimands in epigenome-wide association studies
BACKGROUND: In DNA methylation analyses like epigenome-wide association studies, effects in differentially methylated CpG sites are assessed. Two kinds of outcomes can be used for statistical analysis: Beta-values and M-values. M-values follow a normal distribution and help to detect differentially...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8086103/ https://www.ncbi.nlm.nih.gov/pubmed/33926513 http://dx.doi.org/10.1186/s13148-021-01083-9 |
_version_ | 1783686458940325888 |
---|---|
author | Kruppa, Jochen Sieg, Miriam Richter, Gesa Pohrt, Anne |
author_facet | Kruppa, Jochen Sieg, Miriam Richter, Gesa Pohrt, Anne |
author_sort | Kruppa, Jochen |
collection | PubMed |
description | BACKGROUND: In DNA methylation analyses like epigenome-wide association studies, effects in differentially methylated CpG sites are assessed. Two kinds of outcomes can be used for statistical analysis: Beta-values and M-values. M-values follow a normal distribution and help to detect differentially methylated CpG sites. As biological effect measures, differences of M-values are more or less meaningless. Beta-values are of more interest since they can be interpreted directly as differences in percentage of DNA methylation at a given CpG site, but they have poor statistical properties. Different frameworks are proposed for reporting estimands in DNA methylation analysis, relying on Beta-values, M-values, or both. RESULTS: We present and discuss four possible approaches of achieving estimands in DNA methylation analysis. In addition, we present the usage of M-values or Beta-values in the context of bioinformatical pipelines, which often demand a predefined outcome. We show the dependencies between the differences in M-values to differences in Beta-values in two data simulations: a analysis with and without confounder effect. Without present confounder effects, M-values can be used for the statistical analysis and Beta-values statistics for the reporting. If confounder effects exist, we demonstrate the deviations and correct the effects by the intercept method. Finally, we demonstrate the theoretical problem on two large human genome-wide DNA methylation datasets to verify the results. CONCLUSIONS: The usage of M-values in the analysis of DNA methylation data will produce effect estimates, which cannot be biologically interpreted. The parallel usage of Beta-value statistics ignores possible confounder effects and can therefore not be recommended. Hence, if the differences in Beta-values are the focus of the study, the intercept method is recommendable. Hyper- or hypomethylated CpG sites must then be carefully evaluated. If an exploratory analysis of possible CpG sites is the aim of the study, M-values can be used for inference. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13148-021-01083-9. |
format | Online Article Text |
id | pubmed-8086103 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-80861032021-04-30 Estimands in epigenome-wide association studies Kruppa, Jochen Sieg, Miriam Richter, Gesa Pohrt, Anne Clin Epigenetics Methodology BACKGROUND: In DNA methylation analyses like epigenome-wide association studies, effects in differentially methylated CpG sites are assessed. Two kinds of outcomes can be used for statistical analysis: Beta-values and M-values. M-values follow a normal distribution and help to detect differentially methylated CpG sites. As biological effect measures, differences of M-values are more or less meaningless. Beta-values are of more interest since they can be interpreted directly as differences in percentage of DNA methylation at a given CpG site, but they have poor statistical properties. Different frameworks are proposed for reporting estimands in DNA methylation analysis, relying on Beta-values, M-values, or both. RESULTS: We present and discuss four possible approaches of achieving estimands in DNA methylation analysis. In addition, we present the usage of M-values or Beta-values in the context of bioinformatical pipelines, which often demand a predefined outcome. We show the dependencies between the differences in M-values to differences in Beta-values in two data simulations: a analysis with and without confounder effect. Without present confounder effects, M-values can be used for the statistical analysis and Beta-values statistics for the reporting. If confounder effects exist, we demonstrate the deviations and correct the effects by the intercept method. Finally, we demonstrate the theoretical problem on two large human genome-wide DNA methylation datasets to verify the results. CONCLUSIONS: The usage of M-values in the analysis of DNA methylation data will produce effect estimates, which cannot be biologically interpreted. The parallel usage of Beta-value statistics ignores possible confounder effects and can therefore not be recommended. Hence, if the differences in Beta-values are the focus of the study, the intercept method is recommendable. Hyper- or hypomethylated CpG sites must then be carefully evaluated. If an exploratory analysis of possible CpG sites is the aim of the study, M-values can be used for inference. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13148-021-01083-9. BioMed Central 2021-04-29 /pmc/articles/PMC8086103/ /pubmed/33926513 http://dx.doi.org/10.1186/s13148-021-01083-9 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Methodology Kruppa, Jochen Sieg, Miriam Richter, Gesa Pohrt, Anne Estimands in epigenome-wide association studies |
title | Estimands in epigenome-wide association studies |
title_full | Estimands in epigenome-wide association studies |
title_fullStr | Estimands in epigenome-wide association studies |
title_full_unstemmed | Estimands in epigenome-wide association studies |
title_short | Estimands in epigenome-wide association studies |
title_sort | estimands in epigenome-wide association studies |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8086103/ https://www.ncbi.nlm.nih.gov/pubmed/33926513 http://dx.doi.org/10.1186/s13148-021-01083-9 |
work_keys_str_mv | AT kruppajochen estimandsinepigenomewideassociationstudies AT siegmiriam estimandsinepigenomewideassociationstudies AT richtergesa estimandsinepigenomewideassociationstudies AT pohrtanne estimandsinepigenomewideassociationstudies |