Cargando…
Bayesian Cross-Validation Comparison of Amino Acid Replacement Models: Contrasting Profile Mixtures, Pairwise Exchangeabilities, and Gamma-Distributed Rates-Across-Sites
Models of amino acid replacement are central to modern phylogenetic inference, particularly so when dealing with deep evolutionary relationships. Traditionally, a single, empirically derived matrix was utilized, so as to keep the degrees-of-freedom of the inference low, and focused on topology. With...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer US
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9643205/ https://www.ncbi.nlm.nih.gov/pubmed/36207534 http://dx.doi.org/10.1007/s00239-022-10076-y |
_version_ | 1784826469543837696 |
---|---|
author | Bujaki, Thomas Rodrigue, Nicolas |
author_facet | Bujaki, Thomas Rodrigue, Nicolas |
author_sort | Bujaki, Thomas |
collection | PubMed |
description | Models of amino acid replacement are central to modern phylogenetic inference, particularly so when dealing with deep evolutionary relationships. Traditionally, a single, empirically derived matrix was utilized, so as to keep the degrees-of-freedom of the inference low, and focused on topology. With the growing size of data sets, however, an amino acid-level general-time-reversible matrix has become increasingly feasible, treating amino acid exchangeabilities and frequencies as free parameters. Moreover, models based on mixtures of multiple matrices are increasingly utilized, in order to account for across-site heterogeneities in amino acid requirements of proteins. Such models exist as finite empirically-derived amino acid profile (or frequency) mixtures, free finite mixtures, as well as free Dirichlet process-based infinite mixtures. All of these approaches are typically combined with a gamma-distributed rates-across-sites model. In spite of the availability of these different aspects to modeling the amino acid replacement process, no study has systematically quantified their relative contributions to their predictive power of real data. Here, we use Bayesian cross-validation to establish a detailed comparison, while activating/deactivating each modeling aspect. For most data sets studied, we find that amino acid mixture models can outrank all single-matrix models, even when the latter include gamma-distributed rates and the former do not. We also find that free finite mixtures consistently outperform empirical finite mixtures. Finally, the Dirichlet process-based mixture model tends to outperform all other approaches. |
format | Online Article Text |
id | pubmed-9643205 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Springer US |
record_format | MEDLINE/PubMed |
spelling | pubmed-96432052022-11-15 Bayesian Cross-Validation Comparison of Amino Acid Replacement Models: Contrasting Profile Mixtures, Pairwise Exchangeabilities, and Gamma-Distributed Rates-Across-Sites Bujaki, Thomas Rodrigue, Nicolas J Mol Evol Original Article Models of amino acid replacement are central to modern phylogenetic inference, particularly so when dealing with deep evolutionary relationships. Traditionally, a single, empirically derived matrix was utilized, so as to keep the degrees-of-freedom of the inference low, and focused on topology. With the growing size of data sets, however, an amino acid-level general-time-reversible matrix has become increasingly feasible, treating amino acid exchangeabilities and frequencies as free parameters. Moreover, models based on mixtures of multiple matrices are increasingly utilized, in order to account for across-site heterogeneities in amino acid requirements of proteins. Such models exist as finite empirically-derived amino acid profile (or frequency) mixtures, free finite mixtures, as well as free Dirichlet process-based infinite mixtures. All of these approaches are typically combined with a gamma-distributed rates-across-sites model. In spite of the availability of these different aspects to modeling the amino acid replacement process, no study has systematically quantified their relative contributions to their predictive power of real data. Here, we use Bayesian cross-validation to establish a detailed comparison, while activating/deactivating each modeling aspect. For most data sets studied, we find that amino acid mixture models can outrank all single-matrix models, even when the latter include gamma-distributed rates and the former do not. We also find that free finite mixtures consistently outperform empirical finite mixtures. Finally, the Dirichlet process-based mixture model tends to outperform all other approaches. Springer US 2022-10-07 2022 /pmc/articles/PMC9643205/ /pubmed/36207534 http://dx.doi.org/10.1007/s00239-022-10076-y Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Original Article Bujaki, Thomas Rodrigue, Nicolas Bayesian Cross-Validation Comparison of Amino Acid Replacement Models: Contrasting Profile Mixtures, Pairwise Exchangeabilities, and Gamma-Distributed Rates-Across-Sites |
title | Bayesian Cross-Validation Comparison of Amino Acid Replacement Models: Contrasting Profile Mixtures, Pairwise Exchangeabilities, and Gamma-Distributed Rates-Across-Sites |
title_full | Bayesian Cross-Validation Comparison of Amino Acid Replacement Models: Contrasting Profile Mixtures, Pairwise Exchangeabilities, and Gamma-Distributed Rates-Across-Sites |
title_fullStr | Bayesian Cross-Validation Comparison of Amino Acid Replacement Models: Contrasting Profile Mixtures, Pairwise Exchangeabilities, and Gamma-Distributed Rates-Across-Sites |
title_full_unstemmed | Bayesian Cross-Validation Comparison of Amino Acid Replacement Models: Contrasting Profile Mixtures, Pairwise Exchangeabilities, and Gamma-Distributed Rates-Across-Sites |
title_short | Bayesian Cross-Validation Comparison of Amino Acid Replacement Models: Contrasting Profile Mixtures, Pairwise Exchangeabilities, and Gamma-Distributed Rates-Across-Sites |
title_sort | bayesian cross-validation comparison of amino acid replacement models: contrasting profile mixtures, pairwise exchangeabilities, and gamma-distributed rates-across-sites |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9643205/ https://www.ncbi.nlm.nih.gov/pubmed/36207534 http://dx.doi.org/10.1007/s00239-022-10076-y |
work_keys_str_mv | AT bujakithomas bayesiancrossvalidationcomparisonofaminoacidreplacementmodelscontrastingprofilemixturespairwiseexchangeabilitiesandgammadistributedratesacrosssites AT rodriguenicolas bayesiancrossvalidationcomparisonofaminoacidreplacementmodelscontrastingprofilemixturespairwiseexchangeabilitiesandgammadistributedratesacrosssites |