Cargando…

simGWAS: a fast method for simulation of large scale case–control GWAS summary statistics

MOTIVATION: Methods for analysis of GWAS summary statistics have encouraged data sharing and democratized the analysis of different diseases. Ideal validation for such methods is application to simulated data, where some ‘truth’ is known. As GWAS increase in size, so does the computational complexit...

Descripción completa

Detalles Bibliográficos
Autores principales: Fortune, Mary D, Wallace, Chris
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6546134/
https://www.ncbi.nlm.nih.gov/pubmed/30371734
http://dx.doi.org/10.1093/bioinformatics/bty898
_version_ 1783423501100646400
author Fortune, Mary D
Wallace, Chris
author_facet Fortune, Mary D
Wallace, Chris
author_sort Fortune, Mary D
collection PubMed
description MOTIVATION: Methods for analysis of GWAS summary statistics have encouraged data sharing and democratized the analysis of different diseases. Ideal validation for such methods is application to simulated data, where some ‘truth’ is known. As GWAS increase in size, so does the computational complexity of such evaluations; standard practice repeatedly simulates and analyses genotype data for all individuals in an example study. RESULTS: We have developed a novel method based on an alternative approach, directly simulating GWAS summary data, without individual data as an intermediate step. We mathematically derive the expected statistics for any set of causal variants and their effect sizes, conditional upon control haplotype frequencies (available from public reference datasets). Simulation of GWAS summary output can be conducted independently of sample size by simulating random variates about these expected values. Across a range of scenarios, our method, produces very similar output to that from simulating individual genotypes with a substantial gain in speed even for modest sample sizes. Fast simulation of GWAS summary statistics will enable more complete and rapid evaluation of summary statistic methods as well as opening new potential avenues of research in fine mapping and gene set enrichment analysis. AVAILABILITY AND IMPLEMENTATION: Our method is available under a GPL license as an R package from http://github.com/chr1swallace/simGWAS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6546134
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-65461342019-06-13 simGWAS: a fast method for simulation of large scale case–control GWAS summary statistics Fortune, Mary D Wallace, Chris Bioinformatics Original Papers MOTIVATION: Methods for analysis of GWAS summary statistics have encouraged data sharing and democratized the analysis of different diseases. Ideal validation for such methods is application to simulated data, where some ‘truth’ is known. As GWAS increase in size, so does the computational complexity of such evaluations; standard practice repeatedly simulates and analyses genotype data for all individuals in an example study. RESULTS: We have developed a novel method based on an alternative approach, directly simulating GWAS summary data, without individual data as an intermediate step. We mathematically derive the expected statistics for any set of causal variants and their effect sizes, conditional upon control haplotype frequencies (available from public reference datasets). Simulation of GWAS summary output can be conducted independently of sample size by simulating random variates about these expected values. Across a range of scenarios, our method, produces very similar output to that from simulating individual genotypes with a substantial gain in speed even for modest sample sizes. Fast simulation of GWAS summary statistics will enable more complete and rapid evaluation of summary statistic methods as well as opening new potential avenues of research in fine mapping and gene set enrichment analysis. AVAILABILITY AND IMPLEMENTATION: Our method is available under a GPL license as an R package from http://github.com/chr1swallace/simGWAS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-06-01 2018-10-29 /pmc/articles/PMC6546134/ /pubmed/30371734 http://dx.doi.org/10.1093/bioinformatics/bty898 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Fortune, Mary D
Wallace, Chris
simGWAS: a fast method for simulation of large scale case–control GWAS summary statistics
title simGWAS: a fast method for simulation of large scale case–control GWAS summary statistics
title_full simGWAS: a fast method for simulation of large scale case–control GWAS summary statistics
title_fullStr simGWAS: a fast method for simulation of large scale case–control GWAS summary statistics
title_full_unstemmed simGWAS: a fast method for simulation of large scale case–control GWAS summary statistics
title_short simGWAS: a fast method for simulation of large scale case–control GWAS summary statistics
title_sort simgwas: a fast method for simulation of large scale case–control gwas summary statistics
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6546134/
https://www.ncbi.nlm.nih.gov/pubmed/30371734
http://dx.doi.org/10.1093/bioinformatics/bty898
work_keys_str_mv AT fortunemaryd simgwasafastmethodforsimulationoflargescalecasecontrolgwassummarystatistics
AT wallacechris simgwasafastmethodforsimulationoflargescalecasecontrolgwassummarystatistics