Cargando…

The Structure of Evolutionary Model Space for Proteins across the Tree of Life

SIMPLE SUMMARY: The relative rates of amino acid substitution over evolutionary time reflect the chemical properties of amino acids. Substitutions that result in an amino acid similar to an ancestral residue accumulate more rapidly than those resulting in a dissimilar amino acid. The substitution ra...

Descripción completa

Detalles Bibliográficos
Autores principales: Scolaro, Gabrielle E., Braun, Edward L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9952988/
https://www.ncbi.nlm.nih.gov/pubmed/36829559
http://dx.doi.org/10.3390/biology12020282
_version_ 1784893766249742336
author Scolaro, Gabrielle E.
Braun, Edward L.
author_facet Scolaro, Gabrielle E.
Braun, Edward L.
author_sort Scolaro, Gabrielle E.
collection PubMed
description SIMPLE SUMMARY: The relative rates of amino acid substitution over evolutionary time reflect the chemical properties of amino acids. Substitutions that result in an amino acid similar to an ancestral residue accumulate more rapidly than those resulting in a dissimilar amino acid. The substitution rates for each amino acid pair are the parameters in models of evolutionary change for proteins. Although the best-fitting model of protein evolution is known to differ among taxa, a comprehensive picture of model changes across the tree of life is not available. In principle, models of protein change might reflect evolutionary history (i.e., closely related taxa have similar models) or the environment (i.e., taxa living in similar environments have similar models). We estimated models of amino acid evolution for organisms across the tree of life, finding evidence that history and the environment have both contributed to model differences. Bacterial models differed from archaeal and eukaryotic models. Models for Halobacteriaceae (archaea that live in highly saline environments) and Thermoprotei (a group of thermophilic archaea) were found to be very distinctive. The rates of substitution for pairs of aromatic amino acids were especially variable. Overall, these results paint a picture of the “evolutionary model space” for proteins across the tree of life. ABSTRACT: The factors that determine the relative rates of amino acid substitution during protein evolution are complex and known to vary among taxa. We estimated relative exchangeabilities for pairs of amino acids from clades spread across the tree of life and assessed the historical signal in the distances among these clade-specific models. We separately trained these models on collections of arbitrarily selected protein alignments and on ribosomal protein alignments. In both cases, we found a clear separation between the models trained using multiple sequence alignments from bacterial clades and the models trained on archaeal and eukaryotic data. We assessed the predictive power of our novel clade-specific models of sequence evolution by asking whether fit to the models could be used to identify the source of multiple sequence alignments. Model fit was generally able to correctly classify protein alignments at the level of domain (bacterial versus archaeal), but the accuracy of classification at finer scales was much lower. The only exceptions to this were the relatively high classification accuracy for two archaeal lineages: Halobacteriaceae and Thermoprotei. Genomic GC content had a modest impact on relative exchangeabilities despite having a large impact on amino acid frequencies. Relative exchangeabilities involving aromatic residues exhibited the largest differences among models. There were a small number of exchangeabilities that exhibited large differences in comparisons among major clades and between generalized models and ribosomal protein models. Taken as a whole, these results reveal that a small number of relative exchangeabilities are responsible for much of the structure of the “model space” for protein sequence evolution. The clade-specific models we generated may be useful tools for protein phylogenetics, and the structure of evolutionary model space that they revealed has implications for phylogenomic inference across the tree of life.
format Online
Article
Text
id pubmed-9952988
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-99529882023-02-25 The Structure of Evolutionary Model Space for Proteins across the Tree of Life Scolaro, Gabrielle E. Braun, Edward L. Biology (Basel) Article SIMPLE SUMMARY: The relative rates of amino acid substitution over evolutionary time reflect the chemical properties of amino acids. Substitutions that result in an amino acid similar to an ancestral residue accumulate more rapidly than those resulting in a dissimilar amino acid. The substitution rates for each amino acid pair are the parameters in models of evolutionary change for proteins. Although the best-fitting model of protein evolution is known to differ among taxa, a comprehensive picture of model changes across the tree of life is not available. In principle, models of protein change might reflect evolutionary history (i.e., closely related taxa have similar models) or the environment (i.e., taxa living in similar environments have similar models). We estimated models of amino acid evolution for organisms across the tree of life, finding evidence that history and the environment have both contributed to model differences. Bacterial models differed from archaeal and eukaryotic models. Models for Halobacteriaceae (archaea that live in highly saline environments) and Thermoprotei (a group of thermophilic archaea) were found to be very distinctive. The rates of substitution for pairs of aromatic amino acids were especially variable. Overall, these results paint a picture of the “evolutionary model space” for proteins across the tree of life. ABSTRACT: The factors that determine the relative rates of amino acid substitution during protein evolution are complex and known to vary among taxa. We estimated relative exchangeabilities for pairs of amino acids from clades spread across the tree of life and assessed the historical signal in the distances among these clade-specific models. We separately trained these models on collections of arbitrarily selected protein alignments and on ribosomal protein alignments. In both cases, we found a clear separation between the models trained using multiple sequence alignments from bacterial clades and the models trained on archaeal and eukaryotic data. We assessed the predictive power of our novel clade-specific models of sequence evolution by asking whether fit to the models could be used to identify the source of multiple sequence alignments. Model fit was generally able to correctly classify protein alignments at the level of domain (bacterial versus archaeal), but the accuracy of classification at finer scales was much lower. The only exceptions to this were the relatively high classification accuracy for two archaeal lineages: Halobacteriaceae and Thermoprotei. Genomic GC content had a modest impact on relative exchangeabilities despite having a large impact on amino acid frequencies. Relative exchangeabilities involving aromatic residues exhibited the largest differences among models. There were a small number of exchangeabilities that exhibited large differences in comparisons among major clades and between generalized models and ribosomal protein models. Taken as a whole, these results reveal that a small number of relative exchangeabilities are responsible for much of the structure of the “model space” for protein sequence evolution. The clade-specific models we generated may be useful tools for protein phylogenetics, and the structure of evolutionary model space that they revealed has implications for phylogenomic inference across the tree of life. MDPI 2023-02-10 /pmc/articles/PMC9952988/ /pubmed/36829559 http://dx.doi.org/10.3390/biology12020282 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Scolaro, Gabrielle E.
Braun, Edward L.
The Structure of Evolutionary Model Space for Proteins across the Tree of Life
title The Structure of Evolutionary Model Space for Proteins across the Tree of Life
title_full The Structure of Evolutionary Model Space for Proteins across the Tree of Life
title_fullStr The Structure of Evolutionary Model Space for Proteins across the Tree of Life
title_full_unstemmed The Structure of Evolutionary Model Space for Proteins across the Tree of Life
title_short The Structure of Evolutionary Model Space for Proteins across the Tree of Life
title_sort structure of evolutionary model space for proteins across the tree of life
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9952988/
https://www.ncbi.nlm.nih.gov/pubmed/36829559
http://dx.doi.org/10.3390/biology12020282
work_keys_str_mv AT scolarogabriellee thestructureofevolutionarymodelspaceforproteinsacrossthetreeoflife
AT braunedwardl thestructureofevolutionarymodelspaceforproteinsacrossthetreeoflife
AT scolarogabriellee structureofevolutionarymodelspaceforproteinsacrossthetreeoflife
AT braunedwardl structureofevolutionarymodelspaceforproteinsacrossthetreeoflife