Cargando…
Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints
Identifying informative predictors in a high dimensional regression model is a critical step for association analysis and predictive modeling. Signal detection in the high dimensional setting often fails due to the limited sample size. One approach to improving power is through meta-analyzing multip...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10327421/ https://www.ncbi.nlm.nih.gov/pubmed/37426040 |
_version_ | 1785069622727278592 |
---|---|
author | Liu, Molei Xia, Yin Cho, Kelly Cai, Tianxi |
author_facet | Liu, Molei Xia, Yin Cho, Kelly Cai, Tianxi |
author_sort | Liu, Molei |
collection | PubMed |
description | Identifying informative predictors in a high dimensional regression model is a critical step for association analysis and predictive modeling. Signal detection in the high dimensional setting often fails due to the limited sample size. One approach to improving power is through meta-analyzing multiple studies which address the same scientific question. However, integrative analysis of high dimensional data from multiple studies is challenging in the presence of between-study heterogeneity. The challenge is even more pronounced with additional data sharing constraints under which only summary data can be shared across different sites. In this paper, we propose a novel data shielding integrative large–scale testing (DSILT) approach to signal detection allowing between-study heterogeneity and not requiring the sharing of individual level data. Assuming the underlying high dimensional regression models of the data differ across studies yet share similar support, the proposed method incorporates proper integrative estimation and debiasing procedures to construct test statistics for the overall effects of specific covariates. We also develop a multiple testing procedure to identify significant effects while controlling the false discovery rate (FDR) and false discovery proportion (FDP). Theoretical comparisons of the new testing procedure with the ideal individual–level meta–analysis (ILMA) approach and other distributed inference methods are investigated. Simulation studies demonstrate that the proposed testing procedure performs well in both controlling false discovery and attaining power. The new method is applied to a real example detecting interaction effects of the genetic variants for statins and obesity on the risk for type II diabetes. |
format | Online Article Text |
id | pubmed-10327421 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
record_format | MEDLINE/PubMed |
spelling | pubmed-103274212023-07-07 Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints Liu, Molei Xia, Yin Cho, Kelly Cai, Tianxi J Mach Learn Res Article Identifying informative predictors in a high dimensional regression model is a critical step for association analysis and predictive modeling. Signal detection in the high dimensional setting often fails due to the limited sample size. One approach to improving power is through meta-analyzing multiple studies which address the same scientific question. However, integrative analysis of high dimensional data from multiple studies is challenging in the presence of between-study heterogeneity. The challenge is even more pronounced with additional data sharing constraints under which only summary data can be shared across different sites. In this paper, we propose a novel data shielding integrative large–scale testing (DSILT) approach to signal detection allowing between-study heterogeneity and not requiring the sharing of individual level data. Assuming the underlying high dimensional regression models of the data differ across studies yet share similar support, the proposed method incorporates proper integrative estimation and debiasing procedures to construct test statistics for the overall effects of specific covariates. We also develop a multiple testing procedure to identify significant effects while controlling the false discovery rate (FDR) and false discovery proportion (FDP). Theoretical comparisons of the new testing procedure with the ideal individual–level meta–analysis (ILMA) approach and other distributed inference methods are investigated. Simulation studies demonstrate that the proposed testing procedure performs well in both controlling false discovery and attaining power. The new method is applied to a real example detecting interaction effects of the genetic variants for statins and obesity on the risk for type II diabetes. 2021-04 /pmc/articles/PMC10327421/ /pubmed/37426040 Text en https://creativecommons.org/licenses/by/4.0/License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v22/20-774.html. |
spellingShingle | Article Liu, Molei Xia, Yin Cho, Kelly Cai, Tianxi Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints |
title | Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints |
title_full | Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints |
title_fullStr | Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints |
title_full_unstemmed | Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints |
title_short | Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints |
title_sort | integrative high dimensional multiple testing with heterogeneity under data sharing constraints |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10327421/ https://www.ncbi.nlm.nih.gov/pubmed/37426040 |
work_keys_str_mv | AT liumolei integrativehighdimensionalmultipletestingwithheterogeneityunderdatasharingconstraints AT xiayin integrativehighdimensionalmultipletestingwithheterogeneityunderdatasharingconstraints AT chokelly integrativehighdimensionalmultipletestingwithheterogeneityunderdatasharingconstraints AT caitianxi integrativehighdimensionalmultipletestingwithheterogeneityunderdatasharingconstraints |