Cargando…

Improved inference of site-specific positive selection under a generalized parametric codon model when there are multinucleotide mutations and multiple nonsynonymous rates

BACKGROUND: An excess of nonsynonymous substitutions, over neutrality, is considered evidence of positive Darwinian selection. Inference for proteins often relies on estimation of the nonsynonymous to synonymous ratio (ω = d(N)/d(S)) within a codon model. However, to ease computational difficulties,...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dunn, Katherine A., Kenney, Toby, Gu, Hong, Bielawski, Joseph P.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6332903/ https://www.ncbi.nlm.nih.gov/pubmed/30642241 http://dx.doi.org/10.1186/s12862-018-1326-7

_version_	1783387455929450496
author	Dunn, Katherine A. Kenney, Toby Gu, Hong Bielawski, Joseph P.
author_facet	Dunn, Katherine A. Kenney, Toby Gu, Hong Bielawski, Joseph P.
author_sort	Dunn, Katherine A.
collection	PubMed
description	BACKGROUND: An excess of nonsynonymous substitutions, over neutrality, is considered evidence of positive Darwinian selection. Inference for proteins often relies on estimation of the nonsynonymous to synonymous ratio (ω = d(N)/d(S)) within a codon model. However, to ease computational difficulties, ω is typically estimated assuming an idealized substitution process where (i) all nonsynonymous substitutions have the same rate (regardless of impact on organism fitness) and (ii) instantaneous double and triple (DT) nucleotide mutations have zero probability (despite evidence that they can occur). It follows that estimates of ω represent an imperfect summary of the intensity of selection, and that tests based on the ω > 1 threshold could be negatively impacted. RESULTS: We developed a general-purpose parametric (GPP) modelling framework for codons. This novel approach allows specification of all possible instantaneous codon substitutions, including multiple nonsynonymous rates (MNRs) and instantaneous DT nucleotide changes. Existing codon models are specified as special cases of the GPP model. We use GPP models to implement likelihood ratio tests for ω > 1 that accommodate MNRs and DT mutations. Through both simulation and real data analysis, we find that failure to model MNRs and DT mutations reduces power in some cases and inflates false positives in others. False positives under traditional M2a and M8 models were very sensitive to DT changes. This was exacerbated by the choice of frequency parameterization (GY vs. MG), with rates sometimes > 90% under MG. By including MNRs and DT mutations, accuracy and power was greatly improved under the GPP framework. However, we also find that over-parameterized models can perform less well, and this can contribute to degraded performance of LRTs. CONCLUSIONS: We suggest GPP models should be used alongside traditional codon models. Further, all codon models should be deployed within an experimental design that includes (i) assessing robustness to model assumptions, and (ii) investigation of non-standard behaviour of MLEs. As the goal of every analysis is to avoid false conclusions, more work is needed on model selection methods that consider both the increase in fit engendered by a model parameter and the degree to which that parameter is affected by un-modelled evolutionary processes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12862-018-1326-7) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-6332903
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-63329032019-01-23 Improved inference of site-specific positive selection under a generalized parametric codon model when there are multinucleotide mutations and multiple nonsynonymous rates Dunn, Katherine A. Kenney, Toby Gu, Hong Bielawski, Joseph P. BMC Evol Biol Methodology Article BACKGROUND: An excess of nonsynonymous substitutions, over neutrality, is considered evidence of positive Darwinian selection. Inference for proteins often relies on estimation of the nonsynonymous to synonymous ratio (ω = d(N)/d(S)) within a codon model. However, to ease computational difficulties, ω is typically estimated assuming an idealized substitution process where (i) all nonsynonymous substitutions have the same rate (regardless of impact on organism fitness) and (ii) instantaneous double and triple (DT) nucleotide mutations have zero probability (despite evidence that they can occur). It follows that estimates of ω represent an imperfect summary of the intensity of selection, and that tests based on the ω > 1 threshold could be negatively impacted. RESULTS: We developed a general-purpose parametric (GPP) modelling framework for codons. This novel approach allows specification of all possible instantaneous codon substitutions, including multiple nonsynonymous rates (MNRs) and instantaneous DT nucleotide changes. Existing codon models are specified as special cases of the GPP model. We use GPP models to implement likelihood ratio tests for ω > 1 that accommodate MNRs and DT mutations. Through both simulation and real data analysis, we find that failure to model MNRs and DT mutations reduces power in some cases and inflates false positives in others. False positives under traditional M2a and M8 models were very sensitive to DT changes. This was exacerbated by the choice of frequency parameterization (GY vs. MG), with rates sometimes > 90% under MG. By including MNRs and DT mutations, accuracy and power was greatly improved under the GPP framework. However, we also find that over-parameterized models can perform less well, and this can contribute to degraded performance of LRTs. CONCLUSIONS: We suggest GPP models should be used alongside traditional codon models. Further, all codon models should be deployed within an experimental design that includes (i) assessing robustness to model assumptions, and (ii) investigation of non-standard behaviour of MLEs. As the goal of every analysis is to avoid false conclusions, more work is needed on model selection methods that consider both the increase in fit engendered by a model parameter and the degree to which that parameter is affected by un-modelled evolutionary processes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12862-018-1326-7) contains supplementary material, which is available to authorized users. BioMed Central 2019-01-14 /pmc/articles/PMC6332903/ /pubmed/30642241 http://dx.doi.org/10.1186/s12862-018-1326-7 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article Dunn, Katherine A. Kenney, Toby Gu, Hong Bielawski, Joseph P. Improved inference of site-specific positive selection under a generalized parametric codon model when there are multinucleotide mutations and multiple nonsynonymous rates
title	Improved inference of site-specific positive selection under a generalized parametric codon model when there are multinucleotide mutations and multiple nonsynonymous rates
title_full	Improved inference of site-specific positive selection under a generalized parametric codon model when there are multinucleotide mutations and multiple nonsynonymous rates
title_fullStr	Improved inference of site-specific positive selection under a generalized parametric codon model when there are multinucleotide mutations and multiple nonsynonymous rates
title_full_unstemmed	Improved inference of site-specific positive selection under a generalized parametric codon model when there are multinucleotide mutations and multiple nonsynonymous rates
title_short	Improved inference of site-specific positive selection under a generalized parametric codon model when there are multinucleotide mutations and multiple nonsynonymous rates
title_sort	improved inference of site-specific positive selection under a generalized parametric codon model when there are multinucleotide mutations and multiple nonsynonymous rates
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6332903/ https://www.ncbi.nlm.nih.gov/pubmed/30642241 http://dx.doi.org/10.1186/s12862-018-1326-7
work_keys_str_mv	AT dunnkatherinea improvedinferenceofsitespecificpositiveselectionunderageneralizedparametriccodonmodelwhentherearemultinucleotidemutationsandmultiplenonsynonymousrates AT kenneytoby improvedinferenceofsitespecificpositiveselectionunderageneralizedparametriccodonmodelwhentherearemultinucleotidemutationsandmultiplenonsynonymousrates AT guhong improvedinferenceofsitespecificpositiveselectionunderageneralizedparametriccodonmodelwhentherearemultinucleotidemutationsandmultiplenonsynonymousrates AT bielawskijosephp improvedinferenceofsitespecificpositiveselectionunderageneralizedparametriccodonmodelwhentherearemultinucleotidemutationsandmultiplenonsynonymousrates

Improved inference of site-specific positive selection under a generalized parametric codon model when there are multinucleotide mutations and multiple nonsynonymous rates

Ejemplares similares