Cargando…

Two‐phase sample selection strategies for design and analysis in post‐genome‐wide association fine‐mapping studies

Post‐GWAS analysis, in many cases, focuses on fine‐mapping targeted genetic regions discovered at GWAS‐stage; that is, the aim is to pinpoint potential causal variants and susceptibility genes for complex traits and disease outcomes using next‐generation sequencing (NGS) technologies. Large‐scale GW...

Descripción completa

Detalles Bibliográficos
Autores principales: Espin‐Garcia, Osvaldo, Craiu, Radu V., Bull, Shelley B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9293221/
https://www.ncbi.nlm.nih.gov/pubmed/34596256
http://dx.doi.org/10.1002/sim.9211
Descripción
Sumario:Post‐GWAS analysis, in many cases, focuses on fine‐mapping targeted genetic regions discovered at GWAS‐stage; that is, the aim is to pinpoint potential causal variants and susceptibility genes for complex traits and disease outcomes using next‐generation sequencing (NGS) technologies. Large‐scale GWAS cohorts are necessary to identify target regions given the typically modest genetic effect sizes. In this context, two‐phase sampling design and analysis is a cost‐reduction technique that utilizes data collected during phase 1 GWAS to select an informative subsample for phase 2 sequencing. The main goal is to make inference for genetic variants measured via NGS by efficiently combining data from phases 1 and 2. We propose two approaches for selecting a phase 2 design under a budget constraint. The first method identifies sampling fractions that select a phase 2 design yielding an asymptotic variance covariance matrix with certain optimal characteristics, for example, smallest trace, via Lagrange multipliers (LM). The second relies on a genetic algorithm (GA) with a defined fitness function to identify exactly a phase 2 subsample. We perform comprehensive simulation studies to evaluate the empirical properties of the proposed designs for a genetic association study of a quantitative trait. We compare our methods against two ranked designs: residual‐dependent sampling and a recently identified optimal design. Our findings demonstrate that the proposed designs, GA in particular, can render competitive power in combined phase 1 and 2 analysis compared with alternative designs while preserving type 1 error control. These results are especially evident under the more practical scenario where design values need to be defined a priori and are subject to misspecification. We illustrate the proposed methods in a study of triglyceride levels in the North Finland Birth Cohort of 1966. R code to reproduce our results is available at github.com/egosv/TwoPhase_postGWAS.