Cargando…
Speeding up interval estimation for R(2)-based mediation effect of high-dimensional mediators via cross-fitting
Mediation analysis is a useful tool in biomedical research to investigate how molecular phenotypes, such as gene expression, mediate the effect of an exposure on health outcomes. However, commonly used mean-based total mediation effect measures may suffer from cancellation of component-wise mediatio...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9934518/ https://www.ncbi.nlm.nih.gov/pubmed/36798366 http://dx.doi.org/10.1101/2023.02.06.527391 |
Sumario: | Mediation analysis is a useful tool in biomedical research to investigate how molecular phenotypes, such as gene expression, mediate the effect of an exposure on health outcomes. However, commonly used mean-based total mediation effect measures may suffer from cancellation of component-wise mediation effects of opposite directions in the presence of high-dimensional omics mediators. To overcome this limitation, a variance-based R-squared total mediation effect measure has been recently proposed, which, nevertheless, relies on the computationally intensive nonparametric bootstrap for confidence interval estimation. In this work, we formulate a more efficient two-stage cross-fitted estimation procedure for the R-squared measure. To avoid potential bias, we perform iterative Sure Independence Screening (iSIS) in two subsamples to exclude the non-mediators, followed by ordinary least squares (OLS) regressions for the variance estimation. We then construct confidence intervals based on the newly-derived closed-form asymptotic distribution of the R-squared measure. Extensive simulation studies demonstrate that the proposed procedure is hundreds of times more computationally efficient than the resampling-based method with comparable coverage probability. Furthermore, when applied to the Framingham Heart Study, the proposed method replicated the established finding of gene expression mediating age-related variation in systolic blood pressure and discovered the role of gene expression profiles in the relationship between sex and high-density lipoprotein cholesterol. The proposed cross-fitted interval estimation procedure is implemented in R package RsqMed. |
---|