Cargando…

Sampling strategies for frequency spectrum-based population genomic inference

BACKGROUND: The allele frequency spectrum (AFS) consists of counts of the number of single nucleotide polymorphism (SNP) loci with derived variants present at each given frequency in a sample. Multiple approaches have recently been developed for parameter estimation and calculation of model likeliho...

Descripción completa

Detalles Bibliográficos
Autores principales: Robinson, John D, Coffman, Alec J, Hickerson, Michael J, Gutenkunst, Ryan N
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4269862/
https://www.ncbi.nlm.nih.gov/pubmed/25471595
http://dx.doi.org/10.1186/s12862-014-0254-4
_version_ 1782349403156119552
author Robinson, John D
Coffman, Alec J
Hickerson, Michael J
Gutenkunst, Ryan N
author_facet Robinson, John D
Coffman, Alec J
Hickerson, Michael J
Gutenkunst, Ryan N
author_sort Robinson, John D
collection PubMed
description BACKGROUND: The allele frequency spectrum (AFS) consists of counts of the number of single nucleotide polymorphism (SNP) loci with derived variants present at each given frequency in a sample. Multiple approaches have recently been developed for parameter estimation and calculation of model likelihoods based on the joint AFS from two or more populations. We conducted a simulation study of one of these approaches, implemented in the Python module δaδi, to compare parameter estimation and model selection accuracy given different sample sizes under one- and two-population models. RESULTS: Our simulations included a variety of demographic models and two parameterizations that differed in the timing of events (divergence or size change). Using a number of SNPs reasonably obtained through next-generation sequencing approaches (10,000 - 50,000), accurate parameter estimates and model selection were possible for models with more ancient demographic events, even given relatively small numbers of sampled individuals. However, for recent events, larger numbers of individuals were required to achieve accuracy and precision in parameter estimates similar to that seen for models with older divergence or population size changes. We quantify i) the uncertainty in model selection, using tools from information theory, and ii) the accuracy and precision of parameter estimates, using the root mean squared error, as a function of the timing of demographic events, sample sizes used in the analysis, and complexity of the simulated models. CONCLUSIONS: Here, we illustrate the utility of the genome-wide AFS for estimating demographic history and provide recommendations to guide sampling in population genomics studies that seek to draw inference from the AFS. Our results indicate that larger samples of individuals (and thus larger AFS) provide greater power for model selection and parameter estimation for more recent demographic events. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12862-014-0254-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4269862
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42698622014-12-18 Sampling strategies for frequency spectrum-based population genomic inference Robinson, John D Coffman, Alec J Hickerson, Michael J Gutenkunst, Ryan N BMC Evol Biol Research Article BACKGROUND: The allele frequency spectrum (AFS) consists of counts of the number of single nucleotide polymorphism (SNP) loci with derived variants present at each given frequency in a sample. Multiple approaches have recently been developed for parameter estimation and calculation of model likelihoods based on the joint AFS from two or more populations. We conducted a simulation study of one of these approaches, implemented in the Python module δaδi, to compare parameter estimation and model selection accuracy given different sample sizes under one- and two-population models. RESULTS: Our simulations included a variety of demographic models and two parameterizations that differed in the timing of events (divergence or size change). Using a number of SNPs reasonably obtained through next-generation sequencing approaches (10,000 - 50,000), accurate parameter estimates and model selection were possible for models with more ancient demographic events, even given relatively small numbers of sampled individuals. However, for recent events, larger numbers of individuals were required to achieve accuracy and precision in parameter estimates similar to that seen for models with older divergence or population size changes. We quantify i) the uncertainty in model selection, using tools from information theory, and ii) the accuracy and precision of parameter estimates, using the root mean squared error, as a function of the timing of demographic events, sample sizes used in the analysis, and complexity of the simulated models. CONCLUSIONS: Here, we illustrate the utility of the genome-wide AFS for estimating demographic history and provide recommendations to guide sampling in population genomics studies that seek to draw inference from the AFS. Our results indicate that larger samples of individuals (and thus larger AFS) provide greater power for model selection and parameter estimation for more recent demographic events. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12862-014-0254-4) contains supplementary material, which is available to authorized users. BioMed Central 2014-12-04 /pmc/articles/PMC4269862/ /pubmed/25471595 http://dx.doi.org/10.1186/s12862-014-0254-4 Text en © Robinson et al.; licensee BioMed Central Ltd. 2014 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Robinson, John D
Coffman, Alec J
Hickerson, Michael J
Gutenkunst, Ryan N
Sampling strategies for frequency spectrum-based population genomic inference
title Sampling strategies for frequency spectrum-based population genomic inference
title_full Sampling strategies for frequency spectrum-based population genomic inference
title_fullStr Sampling strategies for frequency spectrum-based population genomic inference
title_full_unstemmed Sampling strategies for frequency spectrum-based population genomic inference
title_short Sampling strategies for frequency spectrum-based population genomic inference
title_sort sampling strategies for frequency spectrum-based population genomic inference
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4269862/
https://www.ncbi.nlm.nih.gov/pubmed/25471595
http://dx.doi.org/10.1186/s12862-014-0254-4
work_keys_str_mv AT robinsonjohnd samplingstrategiesforfrequencyspectrumbasedpopulationgenomicinference
AT coffmanalecj samplingstrategiesforfrequencyspectrumbasedpopulationgenomicinference
AT hickersonmichaelj samplingstrategiesforfrequencyspectrumbasedpopulationgenomicinference
AT gutenkunstryann samplingstrategiesforfrequencyspectrumbasedpopulationgenomicinference