Cargando…

Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints

Identifying informative predictors in a high dimensional regression model is a critical step for association analysis and predictive modeling. Signal detection in the high dimensional setting often fails due to the limited sample size. One approach to improving power is through meta-analyzing multip...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Molei, Xia, Yin, Cho, Kelly, Cai, Tianxi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10327421/
https://www.ncbi.nlm.nih.gov/pubmed/37426040
_version_ 1785069622727278592
author Liu, Molei
Xia, Yin
Cho, Kelly
Cai, Tianxi
author_facet Liu, Molei
Xia, Yin
Cho, Kelly
Cai, Tianxi
author_sort Liu, Molei
collection PubMed
description Identifying informative predictors in a high dimensional regression model is a critical step for association analysis and predictive modeling. Signal detection in the high dimensional setting often fails due to the limited sample size. One approach to improving power is through meta-analyzing multiple studies which address the same scientific question. However, integrative analysis of high dimensional data from multiple studies is challenging in the presence of between-study heterogeneity. The challenge is even more pronounced with additional data sharing constraints under which only summary data can be shared across different sites. In this paper, we propose a novel data shielding integrative large–scale testing (DSILT) approach to signal detection allowing between-study heterogeneity and not requiring the sharing of individual level data. Assuming the underlying high dimensional regression models of the data differ across studies yet share similar support, the proposed method incorporates proper integrative estimation and debiasing procedures to construct test statistics for the overall effects of specific covariates. We also develop a multiple testing procedure to identify significant effects while controlling the false discovery rate (FDR) and false discovery proportion (FDP). Theoretical comparisons of the new testing procedure with the ideal individual–level meta–analysis (ILMA) approach and other distributed inference methods are investigated. Simulation studies demonstrate that the proposed testing procedure performs well in both controlling false discovery and attaining power. The new method is applied to a real example detecting interaction effects of the genetic variants for statins and obesity on the risk for type II diabetes.
format Online
Article
Text
id pubmed-10327421
institution National Center for Biotechnology Information
language English
publishDate 2021
record_format MEDLINE/PubMed
spelling pubmed-103274212023-07-07 Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints Liu, Molei Xia, Yin Cho, Kelly Cai, Tianxi J Mach Learn Res Article Identifying informative predictors in a high dimensional regression model is a critical step for association analysis and predictive modeling. Signal detection in the high dimensional setting often fails due to the limited sample size. One approach to improving power is through meta-analyzing multiple studies which address the same scientific question. However, integrative analysis of high dimensional data from multiple studies is challenging in the presence of between-study heterogeneity. The challenge is even more pronounced with additional data sharing constraints under which only summary data can be shared across different sites. In this paper, we propose a novel data shielding integrative large–scale testing (DSILT) approach to signal detection allowing between-study heterogeneity and not requiring the sharing of individual level data. Assuming the underlying high dimensional regression models of the data differ across studies yet share similar support, the proposed method incorporates proper integrative estimation and debiasing procedures to construct test statistics for the overall effects of specific covariates. We also develop a multiple testing procedure to identify significant effects while controlling the false discovery rate (FDR) and false discovery proportion (FDP). Theoretical comparisons of the new testing procedure with the ideal individual–level meta–analysis (ILMA) approach and other distributed inference methods are investigated. Simulation studies demonstrate that the proposed testing procedure performs well in both controlling false discovery and attaining power. The new method is applied to a real example detecting interaction effects of the genetic variants for statins and obesity on the risk for type II diabetes. 2021-04 /pmc/articles/PMC10327421/ /pubmed/37426040 Text en https://creativecommons.org/licenses/by/4.0/License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v22/20-774.html.
spellingShingle Article
Liu, Molei
Xia, Yin
Cho, Kelly
Cai, Tianxi
Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints
title Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints
title_full Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints
title_fullStr Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints
title_full_unstemmed Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints
title_short Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints
title_sort integrative high dimensional multiple testing with heterogeneity under data sharing constraints
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10327421/
https://www.ncbi.nlm.nih.gov/pubmed/37426040
work_keys_str_mv AT liumolei integrativehighdimensionalmultipletestingwithheterogeneityunderdatasharingconstraints
AT xiayin integrativehighdimensionalmultipletestingwithheterogeneityunderdatasharingconstraints
AT chokelly integrativehighdimensionalmultipletestingwithheterogeneityunderdatasharingconstraints
AT caitianxi integrativehighdimensionalmultipletestingwithheterogeneityunderdatasharingconstraints