Cargando…

Comparison of different cell type correction methods for genome-scale epigenetics studies

BACKGROUND: Whole blood is frequently utilized in genome-wide association studies of DNA methylation patterns in relation to environmental exposures or clinical outcomes. These associations can be confounded by cellular heterogeneity. Algorithms have been developed to measure or adjust for this hete...

Descripción completa

Detalles Bibliográficos
Autores principales: Kaushal, Akhilesh, Zhang, Hongmei, Karmaus, Wilfried J. J., Ray, Meredith, Torres, Mylin A., Smith, Alicia K., Wang, Shu-Li
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5391562/
https://www.ncbi.nlm.nih.gov/pubmed/28410574
http://dx.doi.org/10.1186/s12859-017-1611-2
_version_ 1783229297171890176
author Kaushal, Akhilesh
Zhang, Hongmei
Karmaus, Wilfried J. J.
Ray, Meredith
Torres, Mylin A.
Smith, Alicia K.
Wang, Shu-Li
author_facet Kaushal, Akhilesh
Zhang, Hongmei
Karmaus, Wilfried J. J.
Ray, Meredith
Torres, Mylin A.
Smith, Alicia K.
Wang, Shu-Li
author_sort Kaushal, Akhilesh
collection PubMed
description BACKGROUND: Whole blood is frequently utilized in genome-wide association studies of DNA methylation patterns in relation to environmental exposures or clinical outcomes. These associations can be confounded by cellular heterogeneity. Algorithms have been developed to measure or adjust for this heterogeneity, and some have been compared in the literature. However, with new methods available, it is unknown whether the findings will be consistent, if not which method(s) perform better. RESULTS: Methods: We compared eight cell-type correction methods including the method in the minfi R package, the method by Houseman et al., the Removing unwanted variation (RUV) approach, the methods in FaST-LMM-EWASher, ReFACTor, RefFreeEWAS, and RefFreeCellMix R programs, along with one approach utilizing surrogate variables (SVAs). We first evaluated the association of DNA methylation at each CpG across the whole genome with prenatal arsenic exposure levels and with cancer status, adjusted for estimated cell-type information obtained from different methods. We then compared CpGs showing statistical significance from different approaches. For the methods implemented in minfi and proposed by Houseman et al., we utilized homogeneous data with composition of some blood cells available and compared them with the estimated cell compositions. Finally, for methods not explicitly estimating cell compositions, we evaluated their performance using simulated DNA methylation data with a set of latent variables representing “cell types”. Results: Results from the SVA-based method overall showed the highest agreement with all other methods except for FaST-LMM-EWASher. Using homogeneous data, minfi provided better estimations on cell types compared to the originally proposed method by Houseman et al. Further simulation studies on methods free of reference data revealed that SVA provided good sensitivities and specificities, RefFreeCellMix in general produced high sensitivities but specificities tended to be low when confounding is present, and FaST-LMM-EWASher gave the lowest sensitivity but highest specificity. CONCLUSIONS: Results from real data and simulations indicated that SVA is recommended when the focus is on the identification of informative CpGs. When appropriate reference data are available, the method implemented in the minfi package is recommended. However, if no such reference data are available or if the focus is not on estimating cell proportions, the SVA method is suggested. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1611-2) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5391562
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-53915622017-04-14 Comparison of different cell type correction methods for genome-scale epigenetics studies Kaushal, Akhilesh Zhang, Hongmei Karmaus, Wilfried J. J. Ray, Meredith Torres, Mylin A. Smith, Alicia K. Wang, Shu-Li BMC Bioinformatics Research Article BACKGROUND: Whole blood is frequently utilized in genome-wide association studies of DNA methylation patterns in relation to environmental exposures or clinical outcomes. These associations can be confounded by cellular heterogeneity. Algorithms have been developed to measure or adjust for this heterogeneity, and some have been compared in the literature. However, with new methods available, it is unknown whether the findings will be consistent, if not which method(s) perform better. RESULTS: Methods: We compared eight cell-type correction methods including the method in the minfi R package, the method by Houseman et al., the Removing unwanted variation (RUV) approach, the methods in FaST-LMM-EWASher, ReFACTor, RefFreeEWAS, and RefFreeCellMix R programs, along with one approach utilizing surrogate variables (SVAs). We first evaluated the association of DNA methylation at each CpG across the whole genome with prenatal arsenic exposure levels and with cancer status, adjusted for estimated cell-type information obtained from different methods. We then compared CpGs showing statistical significance from different approaches. For the methods implemented in minfi and proposed by Houseman et al., we utilized homogeneous data with composition of some blood cells available and compared them with the estimated cell compositions. Finally, for methods not explicitly estimating cell compositions, we evaluated their performance using simulated DNA methylation data with a set of latent variables representing “cell types”. Results: Results from the SVA-based method overall showed the highest agreement with all other methods except for FaST-LMM-EWASher. Using homogeneous data, minfi provided better estimations on cell types compared to the originally proposed method by Houseman et al. Further simulation studies on methods free of reference data revealed that SVA provided good sensitivities and specificities, RefFreeCellMix in general produced high sensitivities but specificities tended to be low when confounding is present, and FaST-LMM-EWASher gave the lowest sensitivity but highest specificity. CONCLUSIONS: Results from real data and simulations indicated that SVA is recommended when the focus is on the identification of informative CpGs. When appropriate reference data are available, the method implemented in the minfi package is recommended. However, if no such reference data are available or if the focus is not on estimating cell proportions, the SVA method is suggested. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1611-2) contains supplementary material, which is available to authorized users. BioMed Central 2017-04-14 /pmc/articles/PMC5391562/ /pubmed/28410574 http://dx.doi.org/10.1186/s12859-017-1611-2 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Kaushal, Akhilesh
Zhang, Hongmei
Karmaus, Wilfried J. J.
Ray, Meredith
Torres, Mylin A.
Smith, Alicia K.
Wang, Shu-Li
Comparison of different cell type correction methods for genome-scale epigenetics studies
title Comparison of different cell type correction methods for genome-scale epigenetics studies
title_full Comparison of different cell type correction methods for genome-scale epigenetics studies
title_fullStr Comparison of different cell type correction methods for genome-scale epigenetics studies
title_full_unstemmed Comparison of different cell type correction methods for genome-scale epigenetics studies
title_short Comparison of different cell type correction methods for genome-scale epigenetics studies
title_sort comparison of different cell type correction methods for genome-scale epigenetics studies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5391562/
https://www.ncbi.nlm.nih.gov/pubmed/28410574
http://dx.doi.org/10.1186/s12859-017-1611-2
work_keys_str_mv AT kaushalakhilesh comparisonofdifferentcelltypecorrectionmethodsforgenomescaleepigeneticsstudies
AT zhanghongmei comparisonofdifferentcelltypecorrectionmethodsforgenomescaleepigeneticsstudies
AT karmauswilfriedjj comparisonofdifferentcelltypecorrectionmethodsforgenomescaleepigeneticsstudies
AT raymeredith comparisonofdifferentcelltypecorrectionmethodsforgenomescaleepigeneticsstudies
AT torresmylina comparisonofdifferentcelltypecorrectionmethodsforgenomescaleepigeneticsstudies
AT smithaliciak comparisonofdifferentcelltypecorrectionmethodsforgenomescaleepigeneticsstudies
AT wangshuli comparisonofdifferentcelltypecorrectionmethodsforgenomescaleepigeneticsstudies