Cargando…

Evaluation of commonly used analysis strategies for epigenome- and transcriptome-wide association studies through replication of large-scale population studies

BACKGROUND: A large number of analysis strategies are available for DNA methylation (DNAm) array and RNA-seq datasets, but it is unclear which strategies are best to use. We compare commonly used strategies and report how they influence results in large cohort studies. RESULTS: We tested the associa...

Descripción completa

Detalles Bibliográficos
Autores principales: van Rooij, Jeroen, Mandaviya, Pooja R., Claringbould, Annique, Felix, Janine F., van Dongen, Jenny, Jansen, Rick, Franke, Lude, ’t Hoen, Peter A. C., Heijmans, Bas, van Meurs, Joyce B. J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6857161/
https://www.ncbi.nlm.nih.gov/pubmed/31727104
http://dx.doi.org/10.1186/s13059-019-1878-x
_version_ 1783470710480437248
author van Rooij, Jeroen
Mandaviya, Pooja R.
Claringbould, Annique
Felix, Janine F.
van Dongen, Jenny
Jansen, Rick
Franke, Lude
’t Hoen, Peter A. C.
Heijmans, Bas
van Meurs, Joyce B. J.
author_facet van Rooij, Jeroen
Mandaviya, Pooja R.
Claringbould, Annique
Felix, Janine F.
van Dongen, Jenny
Jansen, Rick
Franke, Lude
’t Hoen, Peter A. C.
Heijmans, Bas
van Meurs, Joyce B. J.
author_sort van Rooij, Jeroen
collection PubMed
description BACKGROUND: A large number of analysis strategies are available for DNA methylation (DNAm) array and RNA-seq datasets, but it is unclear which strategies are best to use. We compare commonly used strategies and report how they influence results in large cohort studies. RESULTS: We tested the associations of DNAm and RNA expression with age, BMI, and smoking in four different cohorts (n = ~ 2900). By comparing strategies against the base model on the number and percentage of replicated CpGs for DNAm analyses or genes for RNA-seq analyses in a leave-one-out cohort replication approach, we find the choice of the normalization method and statistical test does not strongly influence the results for DNAm array data. However, adjusting for cell counts or hidden confounders substantially decreases the number of replicated CpGs for age and increases the number of replicated CpGs for BMI and smoking. For RNA-seq data, the choice of the normalization method, gene expression inclusion threshold, and statistical test does not strongly influence the results. Including five principal components or excluding correction of technical covariates or cell counts decreases the number of replicated genes. CONCLUSIONS: Results were not influenced by the normalization method or statistical test. However, the correction method for cell counts, technical covariates, principal components, and/or hidden confounders does influence the results.
format Online
Article
Text
id pubmed-6857161
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-68571612019-11-29 Evaluation of commonly used analysis strategies for epigenome- and transcriptome-wide association studies through replication of large-scale population studies van Rooij, Jeroen Mandaviya, Pooja R. Claringbould, Annique Felix, Janine F. van Dongen, Jenny Jansen, Rick Franke, Lude ’t Hoen, Peter A. C. Heijmans, Bas van Meurs, Joyce B. J. Genome Biol Research BACKGROUND: A large number of analysis strategies are available for DNA methylation (DNAm) array and RNA-seq datasets, but it is unclear which strategies are best to use. We compare commonly used strategies and report how they influence results in large cohort studies. RESULTS: We tested the associations of DNAm and RNA expression with age, BMI, and smoking in four different cohorts (n = ~ 2900). By comparing strategies against the base model on the number and percentage of replicated CpGs for DNAm analyses or genes for RNA-seq analyses in a leave-one-out cohort replication approach, we find the choice of the normalization method and statistical test does not strongly influence the results for DNAm array data. However, adjusting for cell counts or hidden confounders substantially decreases the number of replicated CpGs for age and increases the number of replicated CpGs for BMI and smoking. For RNA-seq data, the choice of the normalization method, gene expression inclusion threshold, and statistical test does not strongly influence the results. Including five principal components or excluding correction of technical covariates or cell counts decreases the number of replicated genes. CONCLUSIONS: Results were not influenced by the normalization method or statistical test. However, the correction method for cell counts, technical covariates, principal components, and/or hidden confounders does influence the results. BioMed Central 2019-11-14 /pmc/articles/PMC6857161/ /pubmed/31727104 http://dx.doi.org/10.1186/s13059-019-1878-x Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
van Rooij, Jeroen
Mandaviya, Pooja R.
Claringbould, Annique
Felix, Janine F.
van Dongen, Jenny
Jansen, Rick
Franke, Lude
’t Hoen, Peter A. C.
Heijmans, Bas
van Meurs, Joyce B. J.
Evaluation of commonly used analysis strategies for epigenome- and transcriptome-wide association studies through replication of large-scale population studies
title Evaluation of commonly used analysis strategies for epigenome- and transcriptome-wide association studies through replication of large-scale population studies
title_full Evaluation of commonly used analysis strategies for epigenome- and transcriptome-wide association studies through replication of large-scale population studies
title_fullStr Evaluation of commonly used analysis strategies for epigenome- and transcriptome-wide association studies through replication of large-scale population studies
title_full_unstemmed Evaluation of commonly used analysis strategies for epigenome- and transcriptome-wide association studies through replication of large-scale population studies
title_short Evaluation of commonly used analysis strategies for epigenome- and transcriptome-wide association studies through replication of large-scale population studies
title_sort evaluation of commonly used analysis strategies for epigenome- and transcriptome-wide association studies through replication of large-scale population studies
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6857161/
https://www.ncbi.nlm.nih.gov/pubmed/31727104
http://dx.doi.org/10.1186/s13059-019-1878-x
work_keys_str_mv AT vanrooijjeroen evaluationofcommonlyusedanalysisstrategiesforepigenomeandtranscriptomewideassociationstudiesthroughreplicationoflargescalepopulationstudies
AT mandaviyapoojar evaluationofcommonlyusedanalysisstrategiesforepigenomeandtranscriptomewideassociationstudiesthroughreplicationoflargescalepopulationstudies
AT claringbouldannique evaluationofcommonlyusedanalysisstrategiesforepigenomeandtranscriptomewideassociationstudiesthroughreplicationoflargescalepopulationstudies
AT felixjaninef evaluationofcommonlyusedanalysisstrategiesforepigenomeandtranscriptomewideassociationstudiesthroughreplicationoflargescalepopulationstudies
AT vandongenjenny evaluationofcommonlyusedanalysisstrategiesforepigenomeandtranscriptomewideassociationstudiesthroughreplicationoflargescalepopulationstudies
AT jansenrick evaluationofcommonlyusedanalysisstrategiesforepigenomeandtranscriptomewideassociationstudiesthroughreplicationoflargescalepopulationstudies
AT frankelude evaluationofcommonlyusedanalysisstrategiesforepigenomeandtranscriptomewideassociationstudiesthroughreplicationoflargescalepopulationstudies
AT evaluationofcommonlyusedanalysisstrategiesforepigenomeandtranscriptomewideassociationstudiesthroughreplicationoflargescalepopulationstudies
AT thoenpeterac evaluationofcommonlyusedanalysisstrategiesforepigenomeandtranscriptomewideassociationstudiesthroughreplicationoflargescalepopulationstudies
AT heijmansbas evaluationofcommonlyusedanalysisstrategiesforepigenomeandtranscriptomewideassociationstudiesthroughreplicationoflargescalepopulationstudies
AT vanmeursjoycebj evaluationofcommonlyusedanalysisstrategiesforepigenomeandtranscriptomewideassociationstudiesthroughreplicationoflargescalepopulationstudies