Cargando…

Bayesian Cross-Validation Comparison of Amino Acid Replacement Models: Contrasting Profile Mixtures, Pairwise Exchangeabilities, and Gamma-Distributed Rates-Across-Sites

Models of amino acid replacement are central to modern phylogenetic inference, particularly so when dealing with deep evolutionary relationships. Traditionally, a single, empirically derived matrix was utilized, so as to keep the degrees-of-freedom of the inference low, and focused on topology. With...

Descripción completa

Detalles Bibliográficos
Autores principales: Bujaki, Thomas, Rodrigue, Nicolas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer US 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9643205/
https://www.ncbi.nlm.nih.gov/pubmed/36207534
http://dx.doi.org/10.1007/s00239-022-10076-y
_version_ 1784826469543837696
author Bujaki, Thomas
Rodrigue, Nicolas
author_facet Bujaki, Thomas
Rodrigue, Nicolas
author_sort Bujaki, Thomas
collection PubMed
description Models of amino acid replacement are central to modern phylogenetic inference, particularly so when dealing with deep evolutionary relationships. Traditionally, a single, empirically derived matrix was utilized, so as to keep the degrees-of-freedom of the inference low, and focused on topology. With the growing size of data sets, however, an amino acid-level general-time-reversible matrix has become increasingly feasible, treating amino acid exchangeabilities and frequencies as free parameters. Moreover, models based on mixtures of multiple matrices are increasingly utilized, in order to account for across-site heterogeneities in amino acid requirements of proteins. Such models exist as finite empirically-derived amino acid profile (or frequency) mixtures, free finite mixtures, as well as free Dirichlet process-based infinite mixtures. All of these approaches are typically combined with a gamma-distributed rates-across-sites model. In spite of the availability of these different aspects to modeling the amino acid replacement process, no study has systematically quantified their relative contributions to their predictive power of real data. Here, we use Bayesian cross-validation to establish a detailed comparison, while activating/deactivating each modeling aspect. For most data sets studied, we find that amino acid mixture models can outrank all single-matrix models, even when the latter include gamma-distributed rates and the former do not. We also find that free finite mixtures consistently outperform empirical finite mixtures. Finally, the Dirichlet process-based mixture model tends to outperform all other approaches.
format Online
Article
Text
id pubmed-9643205
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Springer US
record_format MEDLINE/PubMed
spelling pubmed-96432052022-11-15 Bayesian Cross-Validation Comparison of Amino Acid Replacement Models: Contrasting Profile Mixtures, Pairwise Exchangeabilities, and Gamma-Distributed Rates-Across-Sites Bujaki, Thomas Rodrigue, Nicolas J Mol Evol Original Article Models of amino acid replacement are central to modern phylogenetic inference, particularly so when dealing with deep evolutionary relationships. Traditionally, a single, empirically derived matrix was utilized, so as to keep the degrees-of-freedom of the inference low, and focused on topology. With the growing size of data sets, however, an amino acid-level general-time-reversible matrix has become increasingly feasible, treating amino acid exchangeabilities and frequencies as free parameters. Moreover, models based on mixtures of multiple matrices are increasingly utilized, in order to account for across-site heterogeneities in amino acid requirements of proteins. Such models exist as finite empirically-derived amino acid profile (or frequency) mixtures, free finite mixtures, as well as free Dirichlet process-based infinite mixtures. All of these approaches are typically combined with a gamma-distributed rates-across-sites model. In spite of the availability of these different aspects to modeling the amino acid replacement process, no study has systematically quantified their relative contributions to their predictive power of real data. Here, we use Bayesian cross-validation to establish a detailed comparison, while activating/deactivating each modeling aspect. For most data sets studied, we find that amino acid mixture models can outrank all single-matrix models, even when the latter include gamma-distributed rates and the former do not. We also find that free finite mixtures consistently outperform empirical finite mixtures. Finally, the Dirichlet process-based mixture model tends to outperform all other approaches. Springer US 2022-10-07 2022 /pmc/articles/PMC9643205/ /pubmed/36207534 http://dx.doi.org/10.1007/s00239-022-10076-y Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Original Article
Bujaki, Thomas
Rodrigue, Nicolas
Bayesian Cross-Validation Comparison of Amino Acid Replacement Models: Contrasting Profile Mixtures, Pairwise Exchangeabilities, and Gamma-Distributed Rates-Across-Sites
title Bayesian Cross-Validation Comparison of Amino Acid Replacement Models: Contrasting Profile Mixtures, Pairwise Exchangeabilities, and Gamma-Distributed Rates-Across-Sites
title_full Bayesian Cross-Validation Comparison of Amino Acid Replacement Models: Contrasting Profile Mixtures, Pairwise Exchangeabilities, and Gamma-Distributed Rates-Across-Sites
title_fullStr Bayesian Cross-Validation Comparison of Amino Acid Replacement Models: Contrasting Profile Mixtures, Pairwise Exchangeabilities, and Gamma-Distributed Rates-Across-Sites
title_full_unstemmed Bayesian Cross-Validation Comparison of Amino Acid Replacement Models: Contrasting Profile Mixtures, Pairwise Exchangeabilities, and Gamma-Distributed Rates-Across-Sites
title_short Bayesian Cross-Validation Comparison of Amino Acid Replacement Models: Contrasting Profile Mixtures, Pairwise Exchangeabilities, and Gamma-Distributed Rates-Across-Sites
title_sort bayesian cross-validation comparison of amino acid replacement models: contrasting profile mixtures, pairwise exchangeabilities, and gamma-distributed rates-across-sites
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9643205/
https://www.ncbi.nlm.nih.gov/pubmed/36207534
http://dx.doi.org/10.1007/s00239-022-10076-y
work_keys_str_mv AT bujakithomas bayesiancrossvalidationcomparisonofaminoacidreplacementmodelscontrastingprofilemixturespairwiseexchangeabilitiesandgammadistributedratesacrosssites
AT rodriguenicolas bayesiancrossvalidationcomparisonofaminoacidreplacementmodelscontrastingprofilemixturespairwiseexchangeabilitiesandgammadistributedratesacrosssites