Cargando…

Real world scenarios in rare variant association analysis: the impact of imbalance and sample size on the power in silico

BACKGROUND: The development of sequencing techniques and statistical methods provides great opportunities for identifying the impact of rare genetic variation on complex traits. However, there is a lack of knowledge on the impact of sample size, case numbers, the balance of cases vs controls for bot...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Xinyuan, Basile, Anna O., Pendergrass, Sarah A., Ritchie, Marylyn D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6343276/
https://www.ncbi.nlm.nih.gov/pubmed/30669967
http://dx.doi.org/10.1186/s12859-018-2591-6
_version_ 1783389252468342784
author Zhang, Xinyuan
Basile, Anna O.
Pendergrass, Sarah A.
Ritchie, Marylyn D.
author_facet Zhang, Xinyuan
Basile, Anna O.
Pendergrass, Sarah A.
Ritchie, Marylyn D.
author_sort Zhang, Xinyuan
collection PubMed
description BACKGROUND: The development of sequencing techniques and statistical methods provides great opportunities for identifying the impact of rare genetic variation on complex traits. However, there is a lack of knowledge on the impact of sample size, case numbers, the balance of cases vs controls for both burden and dispersion based rare variant association methods. For example, Phenome-Wide Association Studies may have a wide range of case and control sample sizes across hundreds of diagnoses and traits, and with the application of statistical methods to rare variants, it is important to understand the strengths and limitations of the analyses. RESULTS: We conducted a large-scale simulation of randomly selected low-frequency protein-coding regions using twelve different balanced samples with an equal number of cases and controls as well as twenty-one unbalanced sample scenarios. We further explored statistical performance of different minor allele frequency thresholds and a range of genetic effect sizes. Our simulation results demonstrate that using an unbalanced study design has an overall higher type I error rate for both burden and dispersion tests compared with a balanced study design. Regression has an overall higher type I error with balanced cases and controls, while SKAT has higher type I error for unbalanced case-control scenarios. We also found that both type I error and power were driven by the number of cases in addition to the case to control ratio under large control group scenarios. Based on our power simulations, we observed that a SKAT analysis with case numbers larger than 200 for unbalanced case-control models yielded over 90% power with relatively well controlled type I error. To achieve similar power in regression, over 500 cases are needed. Moreover, SKAT showed higher power to detect associations in unbalanced case-control scenarios than regression. CONCLUSIONS: Our results provide important insights into rare variant association study designs by providing a landscape of type I error and statistical power for a wide range of sample sizes. These results can serve as a benchmark for making decisions about study design for rare variant analyses. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2591-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6343276
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63432762019-01-24 Real world scenarios in rare variant association analysis: the impact of imbalance and sample size on the power in silico Zhang, Xinyuan Basile, Anna O. Pendergrass, Sarah A. Ritchie, Marylyn D. BMC Bioinformatics Research Article BACKGROUND: The development of sequencing techniques and statistical methods provides great opportunities for identifying the impact of rare genetic variation on complex traits. However, there is a lack of knowledge on the impact of sample size, case numbers, the balance of cases vs controls for both burden and dispersion based rare variant association methods. For example, Phenome-Wide Association Studies may have a wide range of case and control sample sizes across hundreds of diagnoses and traits, and with the application of statistical methods to rare variants, it is important to understand the strengths and limitations of the analyses. RESULTS: We conducted a large-scale simulation of randomly selected low-frequency protein-coding regions using twelve different balanced samples with an equal number of cases and controls as well as twenty-one unbalanced sample scenarios. We further explored statistical performance of different minor allele frequency thresholds and a range of genetic effect sizes. Our simulation results demonstrate that using an unbalanced study design has an overall higher type I error rate for both burden and dispersion tests compared with a balanced study design. Regression has an overall higher type I error with balanced cases and controls, while SKAT has higher type I error for unbalanced case-control scenarios. We also found that both type I error and power were driven by the number of cases in addition to the case to control ratio under large control group scenarios. Based on our power simulations, we observed that a SKAT analysis with case numbers larger than 200 for unbalanced case-control models yielded over 90% power with relatively well controlled type I error. To achieve similar power in regression, over 500 cases are needed. Moreover, SKAT showed higher power to detect associations in unbalanced case-control scenarios than regression. CONCLUSIONS: Our results provide important insights into rare variant association study designs by providing a landscape of type I error and statistical power for a wide range of sample sizes. These results can serve as a benchmark for making decisions about study design for rare variant analyses. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2591-6) contains supplementary material, which is available to authorized users. BioMed Central 2019-01-22 /pmc/articles/PMC6343276/ /pubmed/30669967 http://dx.doi.org/10.1186/s12859-018-2591-6 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Zhang, Xinyuan
Basile, Anna O.
Pendergrass, Sarah A.
Ritchie, Marylyn D.
Real world scenarios in rare variant association analysis: the impact of imbalance and sample size on the power in silico
title Real world scenarios in rare variant association analysis: the impact of imbalance and sample size on the power in silico
title_full Real world scenarios in rare variant association analysis: the impact of imbalance and sample size on the power in silico
title_fullStr Real world scenarios in rare variant association analysis: the impact of imbalance and sample size on the power in silico
title_full_unstemmed Real world scenarios in rare variant association analysis: the impact of imbalance and sample size on the power in silico
title_short Real world scenarios in rare variant association analysis: the impact of imbalance and sample size on the power in silico
title_sort real world scenarios in rare variant association analysis: the impact of imbalance and sample size on the power in silico
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6343276/
https://www.ncbi.nlm.nih.gov/pubmed/30669967
http://dx.doi.org/10.1186/s12859-018-2591-6
work_keys_str_mv AT zhangxinyuan realworldscenariosinrarevariantassociationanalysistheimpactofimbalanceandsamplesizeonthepowerinsilico
AT basileannao realworldscenariosinrarevariantassociationanalysistheimpactofimbalanceandsamplesizeonthepowerinsilico
AT pendergrasssaraha realworldscenariosinrarevariantassociationanalysistheimpactofimbalanceandsamplesizeonthepowerinsilico
AT ritchiemarylynd realworldscenariosinrarevariantassociationanalysistheimpactofimbalanceandsamplesizeonthepowerinsilico