Cargando…
Evolutionary Shortcuts via Multinucleotide Substitutions and Their Impact on Natural Selection Analyses
Inference and interpretation of evolutionary processes, in particular of the types and targets of natural selection affecting coding sequences, are critically influenced by the assumptions built into statistical models and tests. If certain aspects of the substitution process (even when they are not...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10336034/ https://www.ncbi.nlm.nih.gov/pubmed/37395787 http://dx.doi.org/10.1093/molbev/msad150 |
_version_ | 1785071121021796352 |
---|---|
author | Lucaci, Alexander G Zehr, Jordan D Enard, David Thornton, Joseph W Kosakovsky Pond, Sergei L |
author_facet | Lucaci, Alexander G Zehr, Jordan D Enard, David Thornton, Joseph W Kosakovsky Pond, Sergei L |
author_sort | Lucaci, Alexander G |
collection | PubMed |
description | Inference and interpretation of evolutionary processes, in particular of the types and targets of natural selection affecting coding sequences, are critically influenced by the assumptions built into statistical models and tests. If certain aspects of the substitution process (even when they are not of direct interest) are presumed absent or are modeled with too crude of a simplification, estimates of key model parameters can become biased, often systematically, and lead to poor statistical performance. Previous work established that failing to accommodate multinucleotide (or multihit, MH) substitutions strongly biases [Formula: see text]-based inference towards false-positive inferences of diversifying episodic selection, as does failing to model variation in the rate of synonymous substitution (SRV) among sites. Here, we develop an integrated analytical framework and software tools to simultaneously incorporate these sources of evolutionary complexity into selection analyses. We found that both MH and SRV are ubiquitous in empirical alignments, and incorporating them has a strong effect on whether or not positive selection is detected ([Formula: see text]-fold reduction) and on the distributions of inferred evolutionary rates. With simulation studies, we show that this effect is not attributable to reduced statistical power caused by using a more complex model. After a detailed examination of 21 benchmark alignments and a new high-resolution analysis showing which parts of the alignment provide support for positive selection, we show that MH substitutions occurring along shorter branches in the tree explain a significant fraction of discrepant results in selection detection. Our results add to the growing body of literature which examines decades-old modeling assumptions (including MH) and finds them to be problematic for comparative genomic data analysis. Because multinucleotide substitutions have a significant impact on natural selection detection even at the level of an entire gene, we recommend that selection analyses of this type consider their inclusion as a matter of routine. To facilitate this procedure, we developed, implemented, and benchmarked a simple and well-performing model testing selection detection framework able to screen an alignment for positive selection with two biologically important confounding processes: site-to-site synonymous rate variation, and multinucleotide instantaneous substitutions. |
format | Online Article Text |
id | pubmed-10336034 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-103360342023-07-13 Evolutionary Shortcuts via Multinucleotide Substitutions and Their Impact on Natural Selection Analyses Lucaci, Alexander G Zehr, Jordan D Enard, David Thornton, Joseph W Kosakovsky Pond, Sergei L Mol Biol Evol Methods Inference and interpretation of evolutionary processes, in particular of the types and targets of natural selection affecting coding sequences, are critically influenced by the assumptions built into statistical models and tests. If certain aspects of the substitution process (even when they are not of direct interest) are presumed absent or are modeled with too crude of a simplification, estimates of key model parameters can become biased, often systematically, and lead to poor statistical performance. Previous work established that failing to accommodate multinucleotide (or multihit, MH) substitutions strongly biases [Formula: see text]-based inference towards false-positive inferences of diversifying episodic selection, as does failing to model variation in the rate of synonymous substitution (SRV) among sites. Here, we develop an integrated analytical framework and software tools to simultaneously incorporate these sources of evolutionary complexity into selection analyses. We found that both MH and SRV are ubiquitous in empirical alignments, and incorporating them has a strong effect on whether or not positive selection is detected ([Formula: see text]-fold reduction) and on the distributions of inferred evolutionary rates. With simulation studies, we show that this effect is not attributable to reduced statistical power caused by using a more complex model. After a detailed examination of 21 benchmark alignments and a new high-resolution analysis showing which parts of the alignment provide support for positive selection, we show that MH substitutions occurring along shorter branches in the tree explain a significant fraction of discrepant results in selection detection. Our results add to the growing body of literature which examines decades-old modeling assumptions (including MH) and finds them to be problematic for comparative genomic data analysis. Because multinucleotide substitutions have a significant impact on natural selection detection even at the level of an entire gene, we recommend that selection analyses of this type consider their inclusion as a matter of routine. To facilitate this procedure, we developed, implemented, and benchmarked a simple and well-performing model testing selection detection framework able to screen an alignment for positive selection with two biologically important confounding processes: site-to-site synonymous rate variation, and multinucleotide instantaneous substitutions. Oxford University Press 2023-07-03 /pmc/articles/PMC10336034/ /pubmed/37395787 http://dx.doi.org/10.1093/molbev/msad150 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Lucaci, Alexander G Zehr, Jordan D Enard, David Thornton, Joseph W Kosakovsky Pond, Sergei L Evolutionary Shortcuts via Multinucleotide Substitutions and Their Impact on Natural Selection Analyses |
title | Evolutionary Shortcuts via Multinucleotide Substitutions and Their Impact on Natural Selection Analyses |
title_full | Evolutionary Shortcuts via Multinucleotide Substitutions and Their Impact on Natural Selection Analyses |
title_fullStr | Evolutionary Shortcuts via Multinucleotide Substitutions and Their Impact on Natural Selection Analyses |
title_full_unstemmed | Evolutionary Shortcuts via Multinucleotide Substitutions and Their Impact on Natural Selection Analyses |
title_short | Evolutionary Shortcuts via Multinucleotide Substitutions and Their Impact on Natural Selection Analyses |
title_sort | evolutionary shortcuts via multinucleotide substitutions and their impact on natural selection analyses |
topic | Methods |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10336034/ https://www.ncbi.nlm.nih.gov/pubmed/37395787 http://dx.doi.org/10.1093/molbev/msad150 |
work_keys_str_mv | AT lucacialexanderg evolutionaryshortcutsviamultinucleotidesubstitutionsandtheirimpactonnaturalselectionanalyses AT zehrjordand evolutionaryshortcutsviamultinucleotidesubstitutionsandtheirimpactonnaturalselectionanalyses AT enarddavid evolutionaryshortcutsviamultinucleotidesubstitutionsandtheirimpactonnaturalselectionanalyses AT thorntonjosephw evolutionaryshortcutsviamultinucleotidesubstitutionsandtheirimpactonnaturalselectionanalyses AT kosakovskypondsergeil evolutionaryshortcutsviamultinucleotidesubstitutionsandtheirimpactonnaturalselectionanalyses |