Cargando…

In Epigenomic Studies, Including Cell-Type Adjustments in Regression Models Can Introduce Multicollinearity, Resulting in Apparent Reversal of Direction of Association

Background: Association studies of epigenome-wide DNA methylation and disease can inform biological mechanisms. DNA methylation is often measured in peripheral blood, with heterogeneous cell types with different methylation profiles. Influences such as adiposity-associated inflammation can change ce...

Descripción completa

Detalles Bibliográficos
Autores principales: Barton, Sheila J., Melton, Phillip E., Titcombe, Philip, Murray, Robert, Rauschert, Sebastian, Lillycrop, Karen A., Huang, Rae-Chi, Holbrook, Joanna D., Godfrey, Keith M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6746958/
https://www.ncbi.nlm.nih.gov/pubmed/31552104
http://dx.doi.org/10.3389/fgene.2019.00816
_version_ 1783451790611578880
author Barton, Sheila J.
Melton, Phillip E.
Titcombe, Philip
Murray, Robert
Rauschert, Sebastian
Lillycrop, Karen A.
Huang, Rae-Chi
Holbrook, Joanna D.
Godfrey, Keith M.
author_facet Barton, Sheila J.
Melton, Phillip E.
Titcombe, Philip
Murray, Robert
Rauschert, Sebastian
Lillycrop, Karen A.
Huang, Rae-Chi
Holbrook, Joanna D.
Godfrey, Keith M.
author_sort Barton, Sheila J.
collection PubMed
description Background: Association studies of epigenome-wide DNA methylation and disease can inform biological mechanisms. DNA methylation is often measured in peripheral blood, with heterogeneous cell types with different methylation profiles. Influences such as adiposity-associated inflammation can change cell-type proportions, altering measured blood methylation levels. To determine whether associations between loci-specific methylation and outcomes result from cellular heterogeneity, many studies adjust for estimated blood cell proportions, but high correlations between methylation and cell-type proportions could violate the statistical assumption of no multicollinearity. We examined these assumptions in a population-based study. Methods: CDKN2A promoter CpG methylation was measured in peripheral blood from 812 adolescents aged 17 years (Western Australian Pregnancy Cohort Study). Log(e) adolescent BMI was used as the outcome in a regression analysis with DNA methylation as predictor, adjusting for age/sex. Further regression analyses additionally adjusted for estimated cell-type proportions using the reference-based Houseman method, and simulations modeled the effects of varying levels of correlation between cell proportions and methylation. Correlations between estimated cell proportions and CpG methylation from Illumina 450K were measured. Results: Lower DNA methylation was associated with higher BMI when cell-type adjustment was not included; for CpG4, β = −0.004 log(e)BMI/%methylation (95% CI −0.0065, −0.001; p = 0.003). The direction of association reversed when adjustment for six cell types was made; for CpG4, β = 0.004 log(e)BMI/%methylation (−0.0002, 0.0089; p = 0.06). Correlations between CpG methylation and cell-type proportions were high, and variance inflation factors (VIFs) were extremely high (25 to 113.7). Granulocyte count was correlated with BMI, and removing granulocytes from the regression model reduced all VIFs to <3.1, with persistence of a positive association between methylation and BMI [CpG4 β = 0.004 log(e)BMI/%methylation (−0.0002, 0.0088; p = 0.06)]. Simulations supported major effects of multicollinearity on regression results. Conclusions: Where cell types are highly correlated with other covariates in regression models, the statistical assumption of no multicollinearity may be violated. This can result in reversal of direction of association, particularly when examining associations with phenotypes related to inflammation, as CpG methylation may associate with changes in cell-type proportions. Removing predictors with high correlations from regression models may remove the multicollinearity. However, this might hinder biological interpretability.
format Online
Article
Text
id pubmed-6746958
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-67469582019-09-24 In Epigenomic Studies, Including Cell-Type Adjustments in Regression Models Can Introduce Multicollinearity, Resulting in Apparent Reversal of Direction of Association Barton, Sheila J. Melton, Phillip E. Titcombe, Philip Murray, Robert Rauschert, Sebastian Lillycrop, Karen A. Huang, Rae-Chi Holbrook, Joanna D. Godfrey, Keith M. Front Genet Genetics Background: Association studies of epigenome-wide DNA methylation and disease can inform biological mechanisms. DNA methylation is often measured in peripheral blood, with heterogeneous cell types with different methylation profiles. Influences such as adiposity-associated inflammation can change cell-type proportions, altering measured blood methylation levels. To determine whether associations between loci-specific methylation and outcomes result from cellular heterogeneity, many studies adjust for estimated blood cell proportions, but high correlations between methylation and cell-type proportions could violate the statistical assumption of no multicollinearity. We examined these assumptions in a population-based study. Methods: CDKN2A promoter CpG methylation was measured in peripheral blood from 812 adolescents aged 17 years (Western Australian Pregnancy Cohort Study). Log(e) adolescent BMI was used as the outcome in a regression analysis with DNA methylation as predictor, adjusting for age/sex. Further regression analyses additionally adjusted for estimated cell-type proportions using the reference-based Houseman method, and simulations modeled the effects of varying levels of correlation between cell proportions and methylation. Correlations between estimated cell proportions and CpG methylation from Illumina 450K were measured. Results: Lower DNA methylation was associated with higher BMI when cell-type adjustment was not included; for CpG4, β = −0.004 log(e)BMI/%methylation (95% CI −0.0065, −0.001; p = 0.003). The direction of association reversed when adjustment for six cell types was made; for CpG4, β = 0.004 log(e)BMI/%methylation (−0.0002, 0.0089; p = 0.06). Correlations between CpG methylation and cell-type proportions were high, and variance inflation factors (VIFs) were extremely high (25 to 113.7). Granulocyte count was correlated with BMI, and removing granulocytes from the regression model reduced all VIFs to <3.1, with persistence of a positive association between methylation and BMI [CpG4 β = 0.004 log(e)BMI/%methylation (−0.0002, 0.0088; p = 0.06)]. Simulations supported major effects of multicollinearity on regression results. Conclusions: Where cell types are highly correlated with other covariates in regression models, the statistical assumption of no multicollinearity may be violated. This can result in reversal of direction of association, particularly when examining associations with phenotypes related to inflammation, as CpG methylation may associate with changes in cell-type proportions. Removing predictors with high correlations from regression models may remove the multicollinearity. However, this might hinder biological interpretability. Frontiers Media S.A. 2019-09-10 /pmc/articles/PMC6746958/ /pubmed/31552104 http://dx.doi.org/10.3389/fgene.2019.00816 Text en Copyright © 2019 Barton, Melton, Titcombe, Murray, Rauschert, Lillycrop, Huang, Holbrook and Godfrey http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Barton, Sheila J.
Melton, Phillip E.
Titcombe, Philip
Murray, Robert
Rauschert, Sebastian
Lillycrop, Karen A.
Huang, Rae-Chi
Holbrook, Joanna D.
Godfrey, Keith M.
In Epigenomic Studies, Including Cell-Type Adjustments in Regression Models Can Introduce Multicollinearity, Resulting in Apparent Reversal of Direction of Association
title In Epigenomic Studies, Including Cell-Type Adjustments in Regression Models Can Introduce Multicollinearity, Resulting in Apparent Reversal of Direction of Association
title_full In Epigenomic Studies, Including Cell-Type Adjustments in Regression Models Can Introduce Multicollinearity, Resulting in Apparent Reversal of Direction of Association
title_fullStr In Epigenomic Studies, Including Cell-Type Adjustments in Regression Models Can Introduce Multicollinearity, Resulting in Apparent Reversal of Direction of Association
title_full_unstemmed In Epigenomic Studies, Including Cell-Type Adjustments in Regression Models Can Introduce Multicollinearity, Resulting in Apparent Reversal of Direction of Association
title_short In Epigenomic Studies, Including Cell-Type Adjustments in Regression Models Can Introduce Multicollinearity, Resulting in Apparent Reversal of Direction of Association
title_sort in epigenomic studies, including cell-type adjustments in regression models can introduce multicollinearity, resulting in apparent reversal of direction of association
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6746958/
https://www.ncbi.nlm.nih.gov/pubmed/31552104
http://dx.doi.org/10.3389/fgene.2019.00816
work_keys_str_mv AT bartonsheilaj inepigenomicstudiesincludingcelltypeadjustmentsinregressionmodelscanintroducemulticollinearityresultinginapparentreversalofdirectionofassociation
AT meltonphillipe inepigenomicstudiesincludingcelltypeadjustmentsinregressionmodelscanintroducemulticollinearityresultinginapparentreversalofdirectionofassociation
AT titcombephilip inepigenomicstudiesincludingcelltypeadjustmentsinregressionmodelscanintroducemulticollinearityresultinginapparentreversalofdirectionofassociation
AT murrayrobert inepigenomicstudiesincludingcelltypeadjustmentsinregressionmodelscanintroducemulticollinearityresultinginapparentreversalofdirectionofassociation
AT rauschertsebastian inepigenomicstudiesincludingcelltypeadjustmentsinregressionmodelscanintroducemulticollinearityresultinginapparentreversalofdirectionofassociation
AT lillycropkarena inepigenomicstudiesincludingcelltypeadjustmentsinregressionmodelscanintroducemulticollinearityresultinginapparentreversalofdirectionofassociation
AT huangraechi inepigenomicstudiesincludingcelltypeadjustmentsinregressionmodelscanintroducemulticollinearityresultinginapparentreversalofdirectionofassociation
AT holbrookjoannad inepigenomicstudiesincludingcelltypeadjustmentsinregressionmodelscanintroducemulticollinearityresultinginapparentreversalofdirectionofassociation
AT godfreykeithm inepigenomicstudiesincludingcelltypeadjustmentsinregressionmodelscanintroducemulticollinearityresultinginapparentreversalofdirectionofassociation