Cargando…

Evolutionary Shortcuts via Multinucleotide Substitutions and Their Impact on Natural Selection Analyses

Inference and interpretation of evolutionary processes, in particular of the types and targets of natural selection affecting coding sequences, are critically influenced by the assumptions built into statistical models and tests. If certain aspects of the substitution process (even when they are not...

Descripción completa

Detalles Bibliográficos
Autores principales: Lucaci, Alexander G, Zehr, Jordan D, Enard, David, Thornton, Joseph W, Kosakovsky Pond, Sergei L
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10336034/
https://www.ncbi.nlm.nih.gov/pubmed/37395787
http://dx.doi.org/10.1093/molbev/msad150
_version_ 1785071121021796352
author Lucaci, Alexander G
Zehr, Jordan D
Enard, David
Thornton, Joseph W
Kosakovsky Pond, Sergei L
author_facet Lucaci, Alexander G
Zehr, Jordan D
Enard, David
Thornton, Joseph W
Kosakovsky Pond, Sergei L
author_sort Lucaci, Alexander G
collection PubMed
description Inference and interpretation of evolutionary processes, in particular of the types and targets of natural selection affecting coding sequences, are critically influenced by the assumptions built into statistical models and tests. If certain aspects of the substitution process (even when they are not of direct interest) are presumed absent or are modeled with too crude of a simplification, estimates of key model parameters can become biased, often systematically, and lead to poor statistical performance. Previous work established that failing to accommodate multinucleotide (or multihit, MH) substitutions strongly biases [Formula: see text]-based inference towards false-positive inferences of diversifying episodic selection, as does failing to model variation in the rate of synonymous substitution (SRV) among sites. Here, we develop an integrated analytical framework and software tools to simultaneously incorporate these sources of evolutionary complexity into selection analyses. We found that both MH and SRV are ubiquitous in empirical alignments, and incorporating them has a strong effect on whether or not positive selection is detected ([Formula: see text]-fold reduction) and on the distributions of inferred evolutionary rates. With simulation studies, we show that this effect is not attributable to reduced statistical power caused by using a more complex model. After a detailed examination of 21 benchmark alignments and a new high-resolution analysis showing which parts of the alignment provide support for positive selection, we show that MH substitutions occurring along shorter branches in the tree explain a significant fraction of discrepant results in selection detection. Our results add to the growing body of literature which examines decades-old modeling assumptions (including MH) and finds them to be problematic for comparative genomic data analysis. Because multinucleotide substitutions have a significant impact on natural selection detection even at the level of an entire gene, we recommend that selection analyses of this type consider their inclusion as a matter of routine. To facilitate this procedure, we developed, implemented, and benchmarked a simple and well-performing model testing selection detection framework able to screen an alignment for positive selection with two biologically important confounding processes: site-to-site synonymous rate variation, and multinucleotide instantaneous substitutions.
format Online
Article
Text
id pubmed-10336034
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-103360342023-07-13 Evolutionary Shortcuts via Multinucleotide Substitutions and Their Impact on Natural Selection Analyses Lucaci, Alexander G Zehr, Jordan D Enard, David Thornton, Joseph W Kosakovsky Pond, Sergei L Mol Biol Evol Methods Inference and interpretation of evolutionary processes, in particular of the types and targets of natural selection affecting coding sequences, are critically influenced by the assumptions built into statistical models and tests. If certain aspects of the substitution process (even when they are not of direct interest) are presumed absent or are modeled with too crude of a simplification, estimates of key model parameters can become biased, often systematically, and lead to poor statistical performance. Previous work established that failing to accommodate multinucleotide (or multihit, MH) substitutions strongly biases [Formula: see text]-based inference towards false-positive inferences of diversifying episodic selection, as does failing to model variation in the rate of synonymous substitution (SRV) among sites. Here, we develop an integrated analytical framework and software tools to simultaneously incorporate these sources of evolutionary complexity into selection analyses. We found that both MH and SRV are ubiquitous in empirical alignments, and incorporating them has a strong effect on whether or not positive selection is detected ([Formula: see text]-fold reduction) and on the distributions of inferred evolutionary rates. With simulation studies, we show that this effect is not attributable to reduced statistical power caused by using a more complex model. After a detailed examination of 21 benchmark alignments and a new high-resolution analysis showing which parts of the alignment provide support for positive selection, we show that MH substitutions occurring along shorter branches in the tree explain a significant fraction of discrepant results in selection detection. Our results add to the growing body of literature which examines decades-old modeling assumptions (including MH) and finds them to be problematic for comparative genomic data analysis. Because multinucleotide substitutions have a significant impact on natural selection detection even at the level of an entire gene, we recommend that selection analyses of this type consider their inclusion as a matter of routine. To facilitate this procedure, we developed, implemented, and benchmarked a simple and well-performing model testing selection detection framework able to screen an alignment for positive selection with two biologically important confounding processes: site-to-site synonymous rate variation, and multinucleotide instantaneous substitutions. Oxford University Press 2023-07-03 /pmc/articles/PMC10336034/ /pubmed/37395787 http://dx.doi.org/10.1093/molbev/msad150 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods
Lucaci, Alexander G
Zehr, Jordan D
Enard, David
Thornton, Joseph W
Kosakovsky Pond, Sergei L
Evolutionary Shortcuts via Multinucleotide Substitutions and Their Impact on Natural Selection Analyses
title Evolutionary Shortcuts via Multinucleotide Substitutions and Their Impact on Natural Selection Analyses
title_full Evolutionary Shortcuts via Multinucleotide Substitutions and Their Impact on Natural Selection Analyses
title_fullStr Evolutionary Shortcuts via Multinucleotide Substitutions and Their Impact on Natural Selection Analyses
title_full_unstemmed Evolutionary Shortcuts via Multinucleotide Substitutions and Their Impact on Natural Selection Analyses
title_short Evolutionary Shortcuts via Multinucleotide Substitutions and Their Impact on Natural Selection Analyses
title_sort evolutionary shortcuts via multinucleotide substitutions and their impact on natural selection analyses
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10336034/
https://www.ncbi.nlm.nih.gov/pubmed/37395787
http://dx.doi.org/10.1093/molbev/msad150
work_keys_str_mv AT lucacialexanderg evolutionaryshortcutsviamultinucleotidesubstitutionsandtheirimpactonnaturalselectionanalyses
AT zehrjordand evolutionaryshortcutsviamultinucleotidesubstitutionsandtheirimpactonnaturalselectionanalyses
AT enarddavid evolutionaryshortcutsviamultinucleotidesubstitutionsandtheirimpactonnaturalselectionanalyses
AT thorntonjosephw evolutionaryshortcutsviamultinucleotidesubstitutionsandtheirimpactonnaturalselectionanalyses
AT kosakovskypondsergeil evolutionaryshortcutsviamultinucleotidesubstitutionsandtheirimpactonnaturalselectionanalyses