Cargando…

Generalization of the Ewens sampling formula to arbitrary fitness landscapes

In considering evolution of transcribed regions, regulatory sequences, and other genomic loci, we are often faced with a situation in which the number of allelic states greatly exceeds the size of the population. In this limit, the population eventually adopts a steady state characterized by mutatio...

Descripción completa

Detalles Bibliográficos
Autores principales: Khromov, Pavel, Malliaris, Constantin D., Morozov, Alexandre V.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5764269/
https://www.ncbi.nlm.nih.gov/pubmed/29324850
http://dx.doi.org/10.1371/journal.pone.0190186
_version_ 1783292027994112000
author Khromov, Pavel
Malliaris, Constantin D.
Morozov, Alexandre V.
author_facet Khromov, Pavel
Malliaris, Constantin D.
Morozov, Alexandre V.
author_sort Khromov, Pavel
collection PubMed
description In considering evolution of transcribed regions, regulatory sequences, and other genomic loci, we are often faced with a situation in which the number of allelic states greatly exceeds the size of the population. In this limit, the population eventually adopts a steady state characterized by mutation-selection-drift balance. Although new alleles continue to be explored through mutation, the statistics of the population, and in particular the probabilities of seeing specific allelic configurations in samples taken from the population, do not change with time. In the absence of selection, the probabilities of allelic configurations are given by the Ewens sampling formula, widely used in population genetics to detect deviations from neutrality. Here we develop an extension of this formula to arbitrary fitness distributions. Although our approach is general, we focus on the class of fitness landscapes, inspired by recent high-throughput genotype-phenotype maps, in which alleles can be in several distinct phenotypic states. This class of landscapes yields sampling probabilities that are computationally more tractable and can form a basis for inference of selection signatures from genomic data. Using an efficient numerical implementation of the sampling probabilities, we demonstrate that, for a sizable range of mutation rates and selection coefficients, the steady-state allelic diversity is not neutral. Therefore, it may be used to infer selection coefficients, as well as other evolutionary parameters from population data. We also carry out numerical simulations to challenge various approximations involved in deriving our sampling formulas, such as the infinite-allele limit and the “full connectivity” assumption inherent in the Ewens theory, in which each allele can mutate into any other allele. We find that, at least for the specific numerical examples studied, our theory remains sufficiently accurate even if these assumptions are relaxed. Thus our framework establishes both theoretical and practical foundations for inferring selection signatures from population-level genomic sequence samples.
format Online
Article
Text
id pubmed-5764269
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-57642692018-01-23 Generalization of the Ewens sampling formula to arbitrary fitness landscapes Khromov, Pavel Malliaris, Constantin D. Morozov, Alexandre V. PLoS One Research Article In considering evolution of transcribed regions, regulatory sequences, and other genomic loci, we are often faced with a situation in which the number of allelic states greatly exceeds the size of the population. In this limit, the population eventually adopts a steady state characterized by mutation-selection-drift balance. Although new alleles continue to be explored through mutation, the statistics of the population, and in particular the probabilities of seeing specific allelic configurations in samples taken from the population, do not change with time. In the absence of selection, the probabilities of allelic configurations are given by the Ewens sampling formula, widely used in population genetics to detect deviations from neutrality. Here we develop an extension of this formula to arbitrary fitness distributions. Although our approach is general, we focus on the class of fitness landscapes, inspired by recent high-throughput genotype-phenotype maps, in which alleles can be in several distinct phenotypic states. This class of landscapes yields sampling probabilities that are computationally more tractable and can form a basis for inference of selection signatures from genomic data. Using an efficient numerical implementation of the sampling probabilities, we demonstrate that, for a sizable range of mutation rates and selection coefficients, the steady-state allelic diversity is not neutral. Therefore, it may be used to infer selection coefficients, as well as other evolutionary parameters from population data. We also carry out numerical simulations to challenge various approximations involved in deriving our sampling formulas, such as the infinite-allele limit and the “full connectivity” assumption inherent in the Ewens theory, in which each allele can mutate into any other allele. We find that, at least for the specific numerical examples studied, our theory remains sufficiently accurate even if these assumptions are relaxed. Thus our framework establishes both theoretical and practical foundations for inferring selection signatures from population-level genomic sequence samples. Public Library of Science 2018-01-11 /pmc/articles/PMC5764269/ /pubmed/29324850 http://dx.doi.org/10.1371/journal.pone.0190186 Text en © 2018 Khromov et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Khromov, Pavel
Malliaris, Constantin D.
Morozov, Alexandre V.
Generalization of the Ewens sampling formula to arbitrary fitness landscapes
title Generalization of the Ewens sampling formula to arbitrary fitness landscapes
title_full Generalization of the Ewens sampling formula to arbitrary fitness landscapes
title_fullStr Generalization of the Ewens sampling formula to arbitrary fitness landscapes
title_full_unstemmed Generalization of the Ewens sampling formula to arbitrary fitness landscapes
title_short Generalization of the Ewens sampling formula to arbitrary fitness landscapes
title_sort generalization of the ewens sampling formula to arbitrary fitness landscapes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5764269/
https://www.ncbi.nlm.nih.gov/pubmed/29324850
http://dx.doi.org/10.1371/journal.pone.0190186
work_keys_str_mv AT khromovpavel generalizationoftheewenssamplingformulatoarbitraryfitnesslandscapes
AT malliarisconstantind generalizationoftheewenssamplingformulatoarbitraryfitnesslandscapes
AT morozovalexandrev generalizationoftheewenssamplingformulatoarbitraryfitnesslandscapes