Cargando…

Large-Scale Comparative Analysis of Codon Models Accounting for Protein and Nucleotide Selection

There are numerous sources of variation in the rate of synonymous substitutions inside genes, such as direct selection on the nucleotide sequence, or mutation rate variation. Yet scans for positive selection rely on codon models which incorporate an assumption of effectively neutral synonymous subst...

Descripción completa

Detalles Bibliográficos
Autores principales: Davydov, Iakov I, Salamin, Nicolas, Robinson-Rechavi, Marc
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6526913/
https://www.ncbi.nlm.nih.gov/pubmed/30847475
http://dx.doi.org/10.1093/molbev/msz048
_version_ 1783419967749750784
author Davydov, Iakov I
Salamin, Nicolas
Robinson-Rechavi, Marc
author_facet Davydov, Iakov I
Salamin, Nicolas
Robinson-Rechavi, Marc
author_sort Davydov, Iakov I
collection PubMed
description There are numerous sources of variation in the rate of synonymous substitutions inside genes, such as direct selection on the nucleotide sequence, or mutation rate variation. Yet scans for positive selection rely on codon models which incorporate an assumption of effectively neutral synonymous substitution rate, constant between sites of each gene. Here we perform a large-scale comparison of approaches which incorporate codon substitution rate variation and propose our own simple yet effective modification of existing models. We find strong effects of substitution rate variation on positive selection inference. More than 70% of the genes detected by the classical branch-site model are presumably false positives caused by the incorrect assumption of uniform synonymous substitution rate. We propose a new model which is strongly favored by the data while remaining computationally tractable. With the new model we can capture signatures of nucleotide level selection acting on translation initiation and on splicing sites within the coding region. Finally, we show that rate variation is highest in the highly recombining regions, and we propose that recombination and mutation rate variation, such as high CpG mutation rate, are the two main sources of nucleotide rate variation. Although we detect fewer genes under positive selection in Drosophila than without rate variation, the genes which we detect contain a stronger signal of adaptation of dynein, which could be associated with Wolbachia infection. We provide software to perform positive selection analysis using the new model.
format Online
Article
Text
id pubmed-6526913
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-65269132019-05-28 Large-Scale Comparative Analysis of Codon Models Accounting for Protein and Nucleotide Selection Davydov, Iakov I Salamin, Nicolas Robinson-Rechavi, Marc Mol Biol Evol Methods There are numerous sources of variation in the rate of synonymous substitutions inside genes, such as direct selection on the nucleotide sequence, or mutation rate variation. Yet scans for positive selection rely on codon models which incorporate an assumption of effectively neutral synonymous substitution rate, constant between sites of each gene. Here we perform a large-scale comparison of approaches which incorporate codon substitution rate variation and propose our own simple yet effective modification of existing models. We find strong effects of substitution rate variation on positive selection inference. More than 70% of the genes detected by the classical branch-site model are presumably false positives caused by the incorrect assumption of uniform synonymous substitution rate. We propose a new model which is strongly favored by the data while remaining computationally tractable. With the new model we can capture signatures of nucleotide level selection acting on translation initiation and on splicing sites within the coding region. Finally, we show that rate variation is highest in the highly recombining regions, and we propose that recombination and mutation rate variation, such as high CpG mutation rate, are the two main sources of nucleotide rate variation. Although we detect fewer genes under positive selection in Drosophila than without rate variation, the genes which we detect contain a stronger signal of adaptation of dynein, which could be associated with Wolbachia infection. We provide software to perform positive selection analysis using the new model. Oxford University Press 2019-06 2019-03-07 /pmc/articles/PMC6526913/ /pubmed/30847475 http://dx.doi.org/10.1093/molbev/msz048 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods
Davydov, Iakov I
Salamin, Nicolas
Robinson-Rechavi, Marc
Large-Scale Comparative Analysis of Codon Models Accounting for Protein and Nucleotide Selection
title Large-Scale Comparative Analysis of Codon Models Accounting for Protein and Nucleotide Selection
title_full Large-Scale Comparative Analysis of Codon Models Accounting for Protein and Nucleotide Selection
title_fullStr Large-Scale Comparative Analysis of Codon Models Accounting for Protein and Nucleotide Selection
title_full_unstemmed Large-Scale Comparative Analysis of Codon Models Accounting for Protein and Nucleotide Selection
title_short Large-Scale Comparative Analysis of Codon Models Accounting for Protein and Nucleotide Selection
title_sort large-scale comparative analysis of codon models accounting for protein and nucleotide selection
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6526913/
https://www.ncbi.nlm.nih.gov/pubmed/30847475
http://dx.doi.org/10.1093/molbev/msz048
work_keys_str_mv AT davydoviakovi largescalecomparativeanalysisofcodonmodelsaccountingforproteinandnucleotideselection
AT salaminnicolas largescalecomparativeanalysisofcodonmodelsaccountingforproteinandnucleotideselection
AT robinsonrechavimarc largescalecomparativeanalysisofcodonmodelsaccountingforproteinandnucleotideselection