Cargando…
Evaluation of commonly used analysis strategies for epigenome- and transcriptome-wide association studies through replication of large-scale population studies
BACKGROUND: A large number of analysis strategies are available for DNA methylation (DNAm) array and RNA-seq datasets, but it is unclear which strategies are best to use. We compare commonly used strategies and report how they influence results in large cohort studies. RESULTS: We tested the associa...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6857161/ https://www.ncbi.nlm.nih.gov/pubmed/31727104 http://dx.doi.org/10.1186/s13059-019-1878-x |
_version_ | 1783470710480437248 |
---|---|
author | van Rooij, Jeroen Mandaviya, Pooja R. Claringbould, Annique Felix, Janine F. van Dongen, Jenny Jansen, Rick Franke, Lude ’t Hoen, Peter A. C. Heijmans, Bas van Meurs, Joyce B. J. |
author_facet | van Rooij, Jeroen Mandaviya, Pooja R. Claringbould, Annique Felix, Janine F. van Dongen, Jenny Jansen, Rick Franke, Lude ’t Hoen, Peter A. C. Heijmans, Bas van Meurs, Joyce B. J. |
author_sort | van Rooij, Jeroen |
collection | PubMed |
description | BACKGROUND: A large number of analysis strategies are available for DNA methylation (DNAm) array and RNA-seq datasets, but it is unclear which strategies are best to use. We compare commonly used strategies and report how they influence results in large cohort studies. RESULTS: We tested the associations of DNAm and RNA expression with age, BMI, and smoking in four different cohorts (n = ~ 2900). By comparing strategies against the base model on the number and percentage of replicated CpGs for DNAm analyses or genes for RNA-seq analyses in a leave-one-out cohort replication approach, we find the choice of the normalization method and statistical test does not strongly influence the results for DNAm array data. However, adjusting for cell counts or hidden confounders substantially decreases the number of replicated CpGs for age and increases the number of replicated CpGs for BMI and smoking. For RNA-seq data, the choice of the normalization method, gene expression inclusion threshold, and statistical test does not strongly influence the results. Including five principal components or excluding correction of technical covariates or cell counts decreases the number of replicated genes. CONCLUSIONS: Results were not influenced by the normalization method or statistical test. However, the correction method for cell counts, technical covariates, principal components, and/or hidden confounders does influence the results. |
format | Online Article Text |
id | pubmed-6857161 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-68571612019-11-29 Evaluation of commonly used analysis strategies for epigenome- and transcriptome-wide association studies through replication of large-scale population studies van Rooij, Jeroen Mandaviya, Pooja R. Claringbould, Annique Felix, Janine F. van Dongen, Jenny Jansen, Rick Franke, Lude ’t Hoen, Peter A. C. Heijmans, Bas van Meurs, Joyce B. J. Genome Biol Research BACKGROUND: A large number of analysis strategies are available for DNA methylation (DNAm) array and RNA-seq datasets, but it is unclear which strategies are best to use. We compare commonly used strategies and report how they influence results in large cohort studies. RESULTS: We tested the associations of DNAm and RNA expression with age, BMI, and smoking in four different cohorts (n = ~ 2900). By comparing strategies against the base model on the number and percentage of replicated CpGs for DNAm analyses or genes for RNA-seq analyses in a leave-one-out cohort replication approach, we find the choice of the normalization method and statistical test does not strongly influence the results for DNAm array data. However, adjusting for cell counts or hidden confounders substantially decreases the number of replicated CpGs for age and increases the number of replicated CpGs for BMI and smoking. For RNA-seq data, the choice of the normalization method, gene expression inclusion threshold, and statistical test does not strongly influence the results. Including five principal components or excluding correction of technical covariates or cell counts decreases the number of replicated genes. CONCLUSIONS: Results were not influenced by the normalization method or statistical test. However, the correction method for cell counts, technical covariates, principal components, and/or hidden confounders does influence the results. BioMed Central 2019-11-14 /pmc/articles/PMC6857161/ /pubmed/31727104 http://dx.doi.org/10.1186/s13059-019-1878-x Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research van Rooij, Jeroen Mandaviya, Pooja R. Claringbould, Annique Felix, Janine F. van Dongen, Jenny Jansen, Rick Franke, Lude ’t Hoen, Peter A. C. Heijmans, Bas van Meurs, Joyce B. J. Evaluation of commonly used analysis strategies for epigenome- and transcriptome-wide association studies through replication of large-scale population studies |
title | Evaluation of commonly used analysis strategies for epigenome- and transcriptome-wide association studies through replication of large-scale population studies |
title_full | Evaluation of commonly used analysis strategies for epigenome- and transcriptome-wide association studies through replication of large-scale population studies |
title_fullStr | Evaluation of commonly used analysis strategies for epigenome- and transcriptome-wide association studies through replication of large-scale population studies |
title_full_unstemmed | Evaluation of commonly used analysis strategies for epigenome- and transcriptome-wide association studies through replication of large-scale population studies |
title_short | Evaluation of commonly used analysis strategies for epigenome- and transcriptome-wide association studies through replication of large-scale population studies |
title_sort | evaluation of commonly used analysis strategies for epigenome- and transcriptome-wide association studies through replication of large-scale population studies |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6857161/ https://www.ncbi.nlm.nih.gov/pubmed/31727104 http://dx.doi.org/10.1186/s13059-019-1878-x |
work_keys_str_mv | AT vanrooijjeroen evaluationofcommonlyusedanalysisstrategiesforepigenomeandtranscriptomewideassociationstudiesthroughreplicationoflargescalepopulationstudies AT mandaviyapoojar evaluationofcommonlyusedanalysisstrategiesforepigenomeandtranscriptomewideassociationstudiesthroughreplicationoflargescalepopulationstudies AT claringbouldannique evaluationofcommonlyusedanalysisstrategiesforepigenomeandtranscriptomewideassociationstudiesthroughreplicationoflargescalepopulationstudies AT felixjaninef evaluationofcommonlyusedanalysisstrategiesforepigenomeandtranscriptomewideassociationstudiesthroughreplicationoflargescalepopulationstudies AT vandongenjenny evaluationofcommonlyusedanalysisstrategiesforepigenomeandtranscriptomewideassociationstudiesthroughreplicationoflargescalepopulationstudies AT jansenrick evaluationofcommonlyusedanalysisstrategiesforepigenomeandtranscriptomewideassociationstudiesthroughreplicationoflargescalepopulationstudies AT frankelude evaluationofcommonlyusedanalysisstrategiesforepigenomeandtranscriptomewideassociationstudiesthroughreplicationoflargescalepopulationstudies AT evaluationofcommonlyusedanalysisstrategiesforepigenomeandtranscriptomewideassociationstudiesthroughreplicationoflargescalepopulationstudies AT thoenpeterac evaluationofcommonlyusedanalysisstrategiesforepigenomeandtranscriptomewideassociationstudiesthroughreplicationoflargescalepopulationstudies AT heijmansbas evaluationofcommonlyusedanalysisstrategiesforepigenomeandtranscriptomewideassociationstudiesthroughreplicationoflargescalepopulationstudies AT vanmeursjoycebj evaluationofcommonlyusedanalysisstrategiesforepigenomeandtranscriptomewideassociationstudiesthroughreplicationoflargescalepopulationstudies |