Cargando…

More on the Best Evolutionary Rate for Phylogenetic Analysis

The accumulation of genome-scale molecular data sets for nonmodel taxa brings us ever closer to resolving the tree of life of all living organisms. However, despite the depth of data available, a number of studies that each used thousands of genes have reported conflicting results. The focus of phyl...

Descripción completa

Detalles Bibliográficos
Autores principales: Klopfstein, Seraina, Massingham, Tim, Goldman, Nick
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5790136/
https://www.ncbi.nlm.nih.gov/pubmed/28595363
http://dx.doi.org/10.1093/sysbio/syx051
_version_ 1783296407671668736
author Klopfstein, Seraina
Massingham, Tim
Goldman, Nick
author_facet Klopfstein, Seraina
Massingham, Tim
Goldman, Nick
author_sort Klopfstein, Seraina
collection PubMed
description The accumulation of genome-scale molecular data sets for nonmodel taxa brings us ever closer to resolving the tree of life of all living organisms. However, despite the depth of data available, a number of studies that each used thousands of genes have reported conflicting results. The focus of phylogenomic projects must thus shift to more careful experimental design. Even though we still have a limited understanding of what are the best predictors of the phylogenetic informativeness of a gene, there is wide agreement that one key factor is its evolutionary rate; but there is no consensus as to whether the rates derived as optimal in various analytical, empirical, and simulation approaches have any general applicability. We here use simulations to infer optimal rates in a set of realistic phylogenetic scenarios with varying tree sizes, numbers of terminals, and tree shapes. Furthermore, we study the relationship between the optimal rate and rate variation among sites and among lineages. Finally, we examine how well the predictions made by a range of experimental design methods correlate with the observed performance in our simulations. We find that the optimal level of divergence is surprisingly robust to differences in taxon sampling and even to among-site and among-lineage rate variation as often encountered in empirical data sets. This finding encourages the use of methods that rely on a single optimal rate to predict a gene’s utility. Focusing on correct recovery either of the most basal node in the phylogeny or of the entire topology, the optimal rate is about 0.45 substitutions from root to tip in average Yule trees and about 0.2 in difficult trees with short basal and long-apical branches, but all rates leading to divergence levels between about 0.1 and 0.5 perform reasonably well. Testing the performance of six methods that can be used to predict a gene’s utility against our simulation results, we find that the probability of resolution, signal-noise analysis, and Fisher information are good predictors of phylogenetic informativeness, but they require specification of at least part of a model tree. Likelihood quartet mapping also shows very good performance but only requires sequence alignments and is thus applicable without making assumptions about the phylogeny. Despite them being the most commonly used methods for experimental design, geometric quartet mapping and the integration of phylogenetic informativeness curves perform rather poorly in our comparison. Instead of derived predictors of phylogenetic informativeness, we suggest that the number of sites in a gene that evolve at near-optimal rates (as inferred here) could be used directly to prioritize genes for phylogenetic inference. In combination with measures of model fit, especially with respect to compositional biases and among-site and among-lineage rate variation, such an approach has the potential to greatly improve marker choice and should be tested on empirical data.
format Online
Article
Text
id pubmed-5790136
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-57901362018-02-05 More on the Best Evolutionary Rate for Phylogenetic Analysis Klopfstein, Seraina Massingham, Tim Goldman, Nick Syst Biol Regular Articles The accumulation of genome-scale molecular data sets for nonmodel taxa brings us ever closer to resolving the tree of life of all living organisms. However, despite the depth of data available, a number of studies that each used thousands of genes have reported conflicting results. The focus of phylogenomic projects must thus shift to more careful experimental design. Even though we still have a limited understanding of what are the best predictors of the phylogenetic informativeness of a gene, there is wide agreement that one key factor is its evolutionary rate; but there is no consensus as to whether the rates derived as optimal in various analytical, empirical, and simulation approaches have any general applicability. We here use simulations to infer optimal rates in a set of realistic phylogenetic scenarios with varying tree sizes, numbers of terminals, and tree shapes. Furthermore, we study the relationship between the optimal rate and rate variation among sites and among lineages. Finally, we examine how well the predictions made by a range of experimental design methods correlate with the observed performance in our simulations. We find that the optimal level of divergence is surprisingly robust to differences in taxon sampling and even to among-site and among-lineage rate variation as often encountered in empirical data sets. This finding encourages the use of methods that rely on a single optimal rate to predict a gene’s utility. Focusing on correct recovery either of the most basal node in the phylogeny or of the entire topology, the optimal rate is about 0.45 substitutions from root to tip in average Yule trees and about 0.2 in difficult trees with short basal and long-apical branches, but all rates leading to divergence levels between about 0.1 and 0.5 perform reasonably well. Testing the performance of six methods that can be used to predict a gene’s utility against our simulation results, we find that the probability of resolution, signal-noise analysis, and Fisher information are good predictors of phylogenetic informativeness, but they require specification of at least part of a model tree. Likelihood quartet mapping also shows very good performance but only requires sequence alignments and is thus applicable without making assumptions about the phylogeny. Despite them being the most commonly used methods for experimental design, geometric quartet mapping and the integration of phylogenetic informativeness curves perform rather poorly in our comparison. Instead of derived predictors of phylogenetic informativeness, we suggest that the number of sites in a gene that evolve at near-optimal rates (as inferred here) could be used directly to prioritize genes for phylogenetic inference. In combination with measures of model fit, especially with respect to compositional biases and among-site and among-lineage rate variation, such an approach has the potential to greatly improve marker choice and should be tested on empirical data. Oxford University Press 2017-09 2017-06-08 /pmc/articles/PMC5790136/ /pubmed/28595363 http://dx.doi.org/10.1093/sysbio/syx051 Text en © The Author(s) 2017. Published by Oxford University Press on behalf of the Systematic Biology. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Regular Articles
Klopfstein, Seraina
Massingham, Tim
Goldman, Nick
More on the Best Evolutionary Rate for Phylogenetic Analysis
title More on the Best Evolutionary Rate for Phylogenetic Analysis
title_full More on the Best Evolutionary Rate for Phylogenetic Analysis
title_fullStr More on the Best Evolutionary Rate for Phylogenetic Analysis
title_full_unstemmed More on the Best Evolutionary Rate for Phylogenetic Analysis
title_short More on the Best Evolutionary Rate for Phylogenetic Analysis
title_sort more on the best evolutionary rate for phylogenetic analysis
topic Regular Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5790136/
https://www.ncbi.nlm.nih.gov/pubmed/28595363
http://dx.doi.org/10.1093/sysbio/syx051
work_keys_str_mv AT klopfsteinseraina moreonthebestevolutionaryrateforphylogeneticanalysis
AT massinghamtim moreonthebestevolutionaryrateforphylogeneticanalysis
AT goldmannick moreonthebestevolutionaryrateforphylogeneticanalysis