Cargando…
TreeSolve: Rapid Error-Correction of Microbial Gene Trees
Gene tree reconstruction is an important problem in phylogenetics. However, gene sequences often lack sufficient information to confidently distinguish between competing gene tree topologies. To overcome this limitation, the best gene tree reconstruction methods use a known species tree topology to...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7197061/ http://dx.doi.org/10.1007/978-3-030-42266-0_10 |
Sumario: | Gene tree reconstruction is an important problem in phylogenetics. However, gene sequences often lack sufficient information to confidently distinguish between competing gene tree topologies. To overcome this limitation, the best gene tree reconstruction methods use a known species tree topology to guide the reconstruction of the gene tree. While such species-tree-aware gene tree reconstruction methods have been repeatedly shown to result in vastly more accurate gene trees, the most accurate of these methods often have prohibitively high computational costs. In this work, we introduce a highly computationally efficient and robust species-tree-aware method, named TreeSolve, for microbial gene tree reconstruction. TreeSolve works by collapsing weakly supported edges of the input gene tree, resulting in a non-binary gene tree, and then using new algorithms and techniques to optimally resolve the non-binary gene trees with respect to the given species tree in an appropriately and dynamically constrained search space. Using thousands of real and simulated gene trees, we demonstrate that TreeSolve significantly outperforms the best existing species-tree-aware methods for microbes in terms of accuracy, speed, or both. Crucially, TreeSolve also implicitly keeps track of multiple optimal gene tree reconstructions and can compute either a single best estimate of the gene tree or multiple distinct estimates. As we demonstrate, aggregating over multiple gene tree candidates helps distinguish between correct and incorrect parts of an error-corrected gene tree. Thus, TreeSolve not only enables rapid gene tree error-correction for large gene trees without compromising on accuracy, but also enables accounting of inference uncertainty. |
---|