Cargando…

Accounting for long-range correlations in genome-wide simulations of large cohorts

Coalescent simulations are widely used to examine the effects of evolution and demographic history on the genetic makeup of populations. Thanks to recent progress in algorithms and data structures, simulators such as the widely-used msprime now provide genome-wide simulations for millions of individ...

Descripción completa

Detalles Bibliográficos
Autores principales: Nelson, Dominic, Kelleher, Jerome, Ragsdale, Aaron P., Moreau, Claudia, McVean, Gil, Gravel, Simon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7266353/
https://www.ncbi.nlm.nih.gov/pubmed/32369493
http://dx.doi.org/10.1371/journal.pgen.1008619
_version_ 1783541292559499264
author Nelson, Dominic
Kelleher, Jerome
Ragsdale, Aaron P.
Moreau, Claudia
McVean, Gil
Gravel, Simon
author_facet Nelson, Dominic
Kelleher, Jerome
Ragsdale, Aaron P.
Moreau, Claudia
McVean, Gil
Gravel, Simon
author_sort Nelson, Dominic
collection PubMed
description Coalescent simulations are widely used to examine the effects of evolution and demographic history on the genetic makeup of populations. Thanks to recent progress in algorithms and data structures, simulators such as the widely-used msprime now provide genome-wide simulations for millions of individuals. However, this software relies on classic coalescent theory and its assumptions that sample sizes are small and that the region being simulated is short. Here we show that coalescent simulations of long regions of the genome exhibit large biases in identity-by-descent (IBD), long-range linkage disequilibrium (LD), and ancestry patterns, particularly when the sample size is large. We present a Wright-Fisher extension to msprime, and show that it produces more realistic distributions of IBD, LD, and ancestry proportions, while also addressing more subtle biases of the coalescent. Further, these extensions are more computationally efficient than state-of-the-art coalescent simulations when simulating long regions, including whole-genome data. For shorter regions, efficiency can be maintained via a hybrid model which simulates the recent past under the Wright-Fisher model and uses coalescent simulations in the distant past.
format Online
Article
Text
id pubmed-7266353
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-72663532020-06-10 Accounting for long-range correlations in genome-wide simulations of large cohorts Nelson, Dominic Kelleher, Jerome Ragsdale, Aaron P. Moreau, Claudia McVean, Gil Gravel, Simon PLoS Genet Research Article Coalescent simulations are widely used to examine the effects of evolution and demographic history on the genetic makeup of populations. Thanks to recent progress in algorithms and data structures, simulators such as the widely-used msprime now provide genome-wide simulations for millions of individuals. However, this software relies on classic coalescent theory and its assumptions that sample sizes are small and that the region being simulated is short. Here we show that coalescent simulations of long regions of the genome exhibit large biases in identity-by-descent (IBD), long-range linkage disequilibrium (LD), and ancestry patterns, particularly when the sample size is large. We present a Wright-Fisher extension to msprime, and show that it produces more realistic distributions of IBD, LD, and ancestry proportions, while also addressing more subtle biases of the coalescent. Further, these extensions are more computationally efficient than state-of-the-art coalescent simulations when simulating long regions, including whole-genome data. For shorter regions, efficiency can be maintained via a hybrid model which simulates the recent past under the Wright-Fisher model and uses coalescent simulations in the distant past. Public Library of Science 2020-05-05 /pmc/articles/PMC7266353/ /pubmed/32369493 http://dx.doi.org/10.1371/journal.pgen.1008619 Text en © 2020 Nelson et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Nelson, Dominic
Kelleher, Jerome
Ragsdale, Aaron P.
Moreau, Claudia
McVean, Gil
Gravel, Simon
Accounting for long-range correlations in genome-wide simulations of large cohorts
title Accounting for long-range correlations in genome-wide simulations of large cohorts
title_full Accounting for long-range correlations in genome-wide simulations of large cohorts
title_fullStr Accounting for long-range correlations in genome-wide simulations of large cohorts
title_full_unstemmed Accounting for long-range correlations in genome-wide simulations of large cohorts
title_short Accounting for long-range correlations in genome-wide simulations of large cohorts
title_sort accounting for long-range correlations in genome-wide simulations of large cohorts
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7266353/
https://www.ncbi.nlm.nih.gov/pubmed/32369493
http://dx.doi.org/10.1371/journal.pgen.1008619
work_keys_str_mv AT nelsondominic accountingforlongrangecorrelationsingenomewidesimulationsoflargecohorts
AT kelleherjerome accountingforlongrangecorrelationsingenomewidesimulationsoflargecohorts
AT ragsdaleaaronp accountingforlongrangecorrelationsingenomewidesimulationsoflargecohorts
AT moreauclaudia accountingforlongrangecorrelationsingenomewidesimulationsoflargecohorts
AT mcveangil accountingforlongrangecorrelationsingenomewidesimulationsoflargecohorts
AT gravelsimon accountingforlongrangecorrelationsingenomewidesimulationsoflargecohorts