Cargando…

Standard Codon Substitution Models Overestimate Purifying Selection for Nonstationary Data

Estimation of natural selection on protein-coding sequences is a key comparative genomics approach for de novo prediction of lineage-specific adaptations. Selective pressure is measured on a per-gene basis by comparing the rate of nonsynonymous substitutions to the rate of synonymous substitutions....

Descripción completa

Detalles Bibliográficos
Autores principales: Kaehler, Benjamin D., Yap, Von Bing, Huttley, Gavin A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5381540/
https://www.ncbi.nlm.nih.gov/pubmed/28175284
http://dx.doi.org/10.1093/gbe/evw308
_version_ 1782519951320416256
author Kaehler, Benjamin D.
Yap, Von Bing
Huttley, Gavin A.
author_facet Kaehler, Benjamin D.
Yap, Von Bing
Huttley, Gavin A.
author_sort Kaehler, Benjamin D.
collection PubMed
description Estimation of natural selection on protein-coding sequences is a key comparative genomics approach for de novo prediction of lineage-specific adaptations. Selective pressure is measured on a per-gene basis by comparing the rate of nonsynonymous substitutions to the rate of synonymous substitutions. All published codon substitution models have been time-reversible and thus assume that sequence composition does not change over time. We previously demonstrated that if time-reversible DNA substitution models are applied in the presence of changing sequence composition, the number of substitutions is systematically biased towards overestimation. We extend these findings to the case of codon substitution models and further demonstrate that the ratio of nonsynonymous to synonymous rates of substitution tends to be underestimated over three data sets of mammals, vertebrates, and insects. Our basis for comparison is a nonstationary codon substitution model that allows sequence composition to change. Goodness-of-fit results demonstrate that our new model tends to fit the data better. Direct measurement of nonstationarity shows that bias in estimates of natural selection and genetic distance increases with the degree of violation of the stationarity assumption. Additionally, inferences drawn under time-reversible models are systematically affected by compositional divergence. As genomic sequences accumulate at an accelerating rate, the importance of accurate de novo estimation of natural selection increases. Our results establish that our new model provides a more robust perspective on this fundamental quantity.
format Online
Article
Text
id pubmed-5381540
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-53815402017-04-10 Standard Codon Substitution Models Overestimate Purifying Selection for Nonstationary Data Kaehler, Benjamin D. Yap, Von Bing Huttley, Gavin A. Genome Biol Evol Research Article Estimation of natural selection on protein-coding sequences is a key comparative genomics approach for de novo prediction of lineage-specific adaptations. Selective pressure is measured on a per-gene basis by comparing the rate of nonsynonymous substitutions to the rate of synonymous substitutions. All published codon substitution models have been time-reversible and thus assume that sequence composition does not change over time. We previously demonstrated that if time-reversible DNA substitution models are applied in the presence of changing sequence composition, the number of substitutions is systematically biased towards overestimation. We extend these findings to the case of codon substitution models and further demonstrate that the ratio of nonsynonymous to synonymous rates of substitution tends to be underestimated over three data sets of mammals, vertebrates, and insects. Our basis for comparison is a nonstationary codon substitution model that allows sequence composition to change. Goodness-of-fit results demonstrate that our new model tends to fit the data better. Direct measurement of nonstationarity shows that bias in estimates of natural selection and genetic distance increases with the degree of violation of the stationarity assumption. Additionally, inferences drawn under time-reversible models are systematically affected by compositional divergence. As genomic sequences accumulate at an accelerating rate, the importance of accurate de novo estimation of natural selection increases. Our results establish that our new model provides a more robust perspective on this fundamental quantity. Oxford University Press 2017-01-05 /pmc/articles/PMC5381540/ /pubmed/28175284 http://dx.doi.org/10.1093/gbe/evw308 Text en © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Kaehler, Benjamin D.
Yap, Von Bing
Huttley, Gavin A.
Standard Codon Substitution Models Overestimate Purifying Selection for Nonstationary Data
title Standard Codon Substitution Models Overestimate Purifying Selection for Nonstationary Data
title_full Standard Codon Substitution Models Overestimate Purifying Selection for Nonstationary Data
title_fullStr Standard Codon Substitution Models Overestimate Purifying Selection for Nonstationary Data
title_full_unstemmed Standard Codon Substitution Models Overestimate Purifying Selection for Nonstationary Data
title_short Standard Codon Substitution Models Overestimate Purifying Selection for Nonstationary Data
title_sort standard codon substitution models overestimate purifying selection for nonstationary data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5381540/
https://www.ncbi.nlm.nih.gov/pubmed/28175284
http://dx.doi.org/10.1093/gbe/evw308
work_keys_str_mv AT kaehlerbenjamind standardcodonsubstitutionmodelsoverestimatepurifyingselectionfornonstationarydata
AT yapvonbing standardcodonsubstitutionmodelsoverestimatepurifyingselectionfornonstationarydata
AT huttleygavina standardcodonsubstitutionmodelsoverestimatepurifyingselectionfornonstationarydata