Cargando…

Simulating autosomal genotypes with realistic linkage disequilibrium and a spiked-in genetic effect

BACKGROUND: To evaluate statistical methods for genome-wide genetic analyses, one needs to be able to simulate realistic genotypes. We here describe a method, applicable to a broad range of association study designs, that can simulate autosome-wide single-nucleotide polymorphism data with realistic...

Descripción completa

Detalles Bibliográficos
Autores principales: Shi, M., Umbach, D. M., Wise, A. S., Weinberg, C. R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5749028/
https://www.ncbi.nlm.nih.gov/pubmed/29291710
http://dx.doi.org/10.1186/s12859-017-2004-2
Descripción
Sumario:BACKGROUND: To evaluate statistical methods for genome-wide genetic analyses, one needs to be able to simulate realistic genotypes. We here describe a method, applicable to a broad range of association study designs, that can simulate autosome-wide single-nucleotide polymorphism data with realistic linkage disequilibrium and with spiked in, user-specified, single or multi-SNP causal effects. RESULTS: Our construction uses existing genome-wide association data from unrelated case-parent triads, augmented by including a hypothetical complement triad for each triad (same parents but with a hypothetical offspring who carries the non-transmitted parental alleles). We assign offspring qualitative or quantitative traits probabilistically through a specified risk model and show that our approach destroys the risk signals from the original data. Our method can simulate genetically homogeneous or stratified populations and can simulate case-parents studies, case-control studies, case-only studies, or studies of quantitative traits. We show that allele frequencies and linkage disequilibrium structure in the original genome-wide association sample are preserved in the simulated data. We have implemented our method in an R package (TriadSim) which is freely available at the comprehensive R archive network. CONCLUSION: We have proposed a method for simulating genome-wide SNP data with realistic linkage disequilibrium. Our method will be useful for developing statistical methods for studying genetic associations, including higher order effects like epistasis and gene by environment interactions. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-017-2004-2) contains supplementary material, which is available to authorized users.