Cargando…

The hidden factor: accounting for covariate effects in power and sample size computation for a binary trait

MOTIVATION: Accurate power and sample size estimation is crucial to the design and analysis of genetic association studies. When analyzing a binary trait via logistic regression, important covariates such as age and sex are typically included in the model. However, their effects are rarely properly...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Ziang, Sun, Lei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10070038/
https://www.ncbi.nlm.nih.gov/pubmed/36943372
http://dx.doi.org/10.1093/bioinformatics/btad139
_version_ 1785018956510134272
author Zhang, Ziang
Sun, Lei
author_facet Zhang, Ziang
Sun, Lei
author_sort Zhang, Ziang
collection PubMed
description MOTIVATION: Accurate power and sample size estimation is crucial to the design and analysis of genetic association studies. When analyzing a binary trait via logistic regression, important covariates such as age and sex are typically included in the model. However, their effects are rarely properly considered in power or sample size computation during study planning. Unlike when analyzing a continuous trait, the power of association testing between a binary trait and a genetic variant depends, explicitly, on covariate effects, even under the assumption of gene–environment independence. Earlier work recognizes this hidden factor but the implemented methods are not flexible. We thus propose and implement a generalized method for estimating power and sample size for (discovery or replication) association studies of binary traits that (i) accommodates different types of nongenetic covariates E, (ii) deals with different types of G–E relationships, and (iii) is computationally efficient. RESULTS: Extensive simulation studies show that the proposed method is accurate and computationally efficient for both prospective and retrospective sampling designs with various covariate structures. A proof-of-principle application focused on the understudied African sample in the UK Biobank data. Results show that, in contrast to studying the continuous blood pressure trait, when analyzing the binary hypertension trait ignoring covariate effects of age and sex leads to overestimated power and underestimated replication sample size. AVAILABILITY AND IMPLEMENTATION: The simulated datasets can be found on the online web-page of this manuscript, and the UK Biobank application data can be accessed at https://www.ukbiobank.ac.uk. The R package SPCompute that implements the proposed method is available at CRAN. The genome-wide association studies are carried out using the software PLINK 2.0 [Purcell et al. (Plink: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007;81:559–75.)].
format Online
Article
Text
id pubmed-10070038
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-100700382023-04-04 The hidden factor: accounting for covariate effects in power and sample size computation for a binary trait Zhang, Ziang Sun, Lei Bioinformatics Original Paper MOTIVATION: Accurate power and sample size estimation is crucial to the design and analysis of genetic association studies. When analyzing a binary trait via logistic regression, important covariates such as age and sex are typically included in the model. However, their effects are rarely properly considered in power or sample size computation during study planning. Unlike when analyzing a continuous trait, the power of association testing between a binary trait and a genetic variant depends, explicitly, on covariate effects, even under the assumption of gene–environment independence. Earlier work recognizes this hidden factor but the implemented methods are not flexible. We thus propose and implement a generalized method for estimating power and sample size for (discovery or replication) association studies of binary traits that (i) accommodates different types of nongenetic covariates E, (ii) deals with different types of G–E relationships, and (iii) is computationally efficient. RESULTS: Extensive simulation studies show that the proposed method is accurate and computationally efficient for both prospective and retrospective sampling designs with various covariate structures. A proof-of-principle application focused on the understudied African sample in the UK Biobank data. Results show that, in contrast to studying the continuous blood pressure trait, when analyzing the binary hypertension trait ignoring covariate effects of age and sex leads to overestimated power and underestimated replication sample size. AVAILABILITY AND IMPLEMENTATION: The simulated datasets can be found on the online web-page of this manuscript, and the UK Biobank application data can be accessed at https://www.ukbiobank.ac.uk. The R package SPCompute that implements the proposed method is available at CRAN. The genome-wide association studies are carried out using the software PLINK 2.0 [Purcell et al. (Plink: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007;81:559–75.)]. Oxford University Press 2023-03-21 /pmc/articles/PMC10070038/ /pubmed/36943372 http://dx.doi.org/10.1093/bioinformatics/btad139 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Zhang, Ziang
Sun, Lei
The hidden factor: accounting for covariate effects in power and sample size computation for a binary trait
title The hidden factor: accounting for covariate effects in power and sample size computation for a binary trait
title_full The hidden factor: accounting for covariate effects in power and sample size computation for a binary trait
title_fullStr The hidden factor: accounting for covariate effects in power and sample size computation for a binary trait
title_full_unstemmed The hidden factor: accounting for covariate effects in power and sample size computation for a binary trait
title_short The hidden factor: accounting for covariate effects in power and sample size computation for a binary trait
title_sort hidden factor: accounting for covariate effects in power and sample size computation for a binary trait
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10070038/
https://www.ncbi.nlm.nih.gov/pubmed/36943372
http://dx.doi.org/10.1093/bioinformatics/btad139
work_keys_str_mv AT zhangziang thehiddenfactoraccountingforcovariateeffectsinpowerandsamplesizecomputationforabinarytrait
AT sunlei thehiddenfactoraccountingforcovariateeffectsinpowerandsamplesizecomputationforabinarytrait
AT zhangziang hiddenfactoraccountingforcovariateeffectsinpowerandsamplesizecomputationforabinarytrait
AT sunlei hiddenfactoraccountingforcovariateeffectsinpowerandsamplesizecomputationforabinarytrait