Cargando…
Relative Model Fit Does Not Predict Topological Accuracy in Single-Gene Protein Phylogenetics
It is regarded as best practice in phylogenetic reconstruction to perform relative model selection to determine an appropriate evolutionary model for the data. This procedure ranks a set of candidate models according to their goodness of fit to the data, commonly using an information theoretic crite...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7306691/ https://www.ncbi.nlm.nih.gov/pubmed/32191313 http://dx.doi.org/10.1093/molbev/msaa075 |
_version_ | 1783548706370355200 |
---|---|
author | Spielman, Stephanie J |
author_facet | Spielman, Stephanie J |
author_sort | Spielman, Stephanie J |
collection | PubMed |
description | It is regarded as best practice in phylogenetic reconstruction to perform relative model selection to determine an appropriate evolutionary model for the data. This procedure ranks a set of candidate models according to their goodness of fit to the data, commonly using an information theoretic criterion. Users then specify the best-ranking model for inference. Although it is often assumed that better-fitting models translate to increase accuracy, recent studies have shown that the specific model employed may not substantially affect inferences. We examine whether there is a systematic relationship between relative model fit and topological inference accuracy in protein phylogenetics, using simulations and real sequences. Simulations employed site-heterogeneous mechanistic codon models that are distinct from protein-level phylogenetic inference models, allowing us to investigate how protein models performs when they are misspecified to the data, as will be the case for any real sequence analysis. We broadly find that phylogenies inferred across models with vastly different fits to the data produce highly consistent topologies. We additionally find that all models infer similar proportions of false-positive splits, raising the possibility that all available models of protein evolution are similarly misspecified. Moreover, we find that the parameter-rich GTR (general time reversible) model, whose amino acid exchangeabilities are free parameters, performs similarly to models with fixed exchangeabilities, although the inference precision associated with GTR models was not examined. We conclude that, although relative model selection may not hinder phylogenetic analysis on protein data, it may not offer specific predictable improvements and is not a reliable proxy for accuracy. |
format | Online Article Text |
id | pubmed-7306691 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-73066912020-06-29 Relative Model Fit Does Not Predict Topological Accuracy in Single-Gene Protein Phylogenetics Spielman, Stephanie J Mol Biol Evol Methods It is regarded as best practice in phylogenetic reconstruction to perform relative model selection to determine an appropriate evolutionary model for the data. This procedure ranks a set of candidate models according to their goodness of fit to the data, commonly using an information theoretic criterion. Users then specify the best-ranking model for inference. Although it is often assumed that better-fitting models translate to increase accuracy, recent studies have shown that the specific model employed may not substantially affect inferences. We examine whether there is a systematic relationship between relative model fit and topological inference accuracy in protein phylogenetics, using simulations and real sequences. Simulations employed site-heterogeneous mechanistic codon models that are distinct from protein-level phylogenetic inference models, allowing us to investigate how protein models performs when they are misspecified to the data, as will be the case for any real sequence analysis. We broadly find that phylogenies inferred across models with vastly different fits to the data produce highly consistent topologies. We additionally find that all models infer similar proportions of false-positive splits, raising the possibility that all available models of protein evolution are similarly misspecified. Moreover, we find that the parameter-rich GTR (general time reversible) model, whose amino acid exchangeabilities are free parameters, performs similarly to models with fixed exchangeabilities, although the inference precision associated with GTR models was not examined. We conclude that, although relative model selection may not hinder phylogenetic analysis on protein data, it may not offer specific predictable improvements and is not a reliable proxy for accuracy. Oxford University Press 2020-07 2020-04-02 /pmc/articles/PMC7306691/ /pubmed/32191313 http://dx.doi.org/10.1093/molbev/msaa075 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Methods Spielman, Stephanie J Relative Model Fit Does Not Predict Topological Accuracy in Single-Gene Protein Phylogenetics |
title | Relative Model Fit Does Not Predict Topological Accuracy in Single-Gene Protein Phylogenetics |
title_full | Relative Model Fit Does Not Predict Topological Accuracy in Single-Gene Protein Phylogenetics |
title_fullStr | Relative Model Fit Does Not Predict Topological Accuracy in Single-Gene Protein Phylogenetics |
title_full_unstemmed | Relative Model Fit Does Not Predict Topological Accuracy in Single-Gene Protein Phylogenetics |
title_short | Relative Model Fit Does Not Predict Topological Accuracy in Single-Gene Protein Phylogenetics |
title_sort | relative model fit does not predict topological accuracy in single-gene protein phylogenetics |
topic | Methods |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7306691/ https://www.ncbi.nlm.nih.gov/pubmed/32191313 http://dx.doi.org/10.1093/molbev/msaa075 |
work_keys_str_mv | AT spielmanstephaniej relativemodelfitdoesnotpredicttopologicalaccuracyinsinglegeneproteinphylogenetics |