Cargando…

TreeSolve: Rapid Error-Correction of Microbial Gene Trees

Gene tree reconstruction is an important problem in phylogenetics. However, gene sequences often lack sufficient information to confidently distinguish between competing gene tree topologies. To overcome this limitation, the best gene tree reconstruction methods use a known species tree topology to...

Descripción completa

Detalles Bibliográficos
Autores principales: Kordi, Misagh, Bansal, Mukul S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7197061/
http://dx.doi.org/10.1007/978-3-030-42266-0_10
_version_ 1783528809388048384
author Kordi, Misagh
Bansal, Mukul S.
author_facet Kordi, Misagh
Bansal, Mukul S.
author_sort Kordi, Misagh
collection PubMed
description Gene tree reconstruction is an important problem in phylogenetics. However, gene sequences often lack sufficient information to confidently distinguish between competing gene tree topologies. To overcome this limitation, the best gene tree reconstruction methods use a known species tree topology to guide the reconstruction of the gene tree. While such species-tree-aware gene tree reconstruction methods have been repeatedly shown to result in vastly more accurate gene trees, the most accurate of these methods often have prohibitively high computational costs. In this work, we introduce a highly computationally efficient and robust species-tree-aware method, named TreeSolve, for microbial gene tree reconstruction. TreeSolve works by collapsing weakly supported edges of the input gene tree, resulting in a non-binary gene tree, and then using new algorithms and techniques to optimally resolve the non-binary gene trees with respect to the given species tree in an appropriately and dynamically constrained search space. Using thousands of real and simulated gene trees, we demonstrate that TreeSolve significantly outperforms the best existing species-tree-aware methods for microbes in terms of accuracy, speed, or both. Crucially, TreeSolve also implicitly keeps track of multiple optimal gene tree reconstructions and can compute either a single best estimate of the gene tree or multiple distinct estimates. As we demonstrate, aggregating over multiple gene tree candidates helps distinguish between correct and incorrect parts of an error-corrected gene tree. Thus, TreeSolve not only enables rapid gene tree error-correction for large gene trees without compromising on accuracy, but also enables accounting of inference uncertainty.
format Online
Article
Text
id pubmed-7197061
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-71970612020-05-04 TreeSolve: Rapid Error-Correction of Microbial Gene Trees Kordi, Misagh Bansal, Mukul S. Algorithms for Computational Biology Article Gene tree reconstruction is an important problem in phylogenetics. However, gene sequences often lack sufficient information to confidently distinguish between competing gene tree topologies. To overcome this limitation, the best gene tree reconstruction methods use a known species tree topology to guide the reconstruction of the gene tree. While such species-tree-aware gene tree reconstruction methods have been repeatedly shown to result in vastly more accurate gene trees, the most accurate of these methods often have prohibitively high computational costs. In this work, we introduce a highly computationally efficient and robust species-tree-aware method, named TreeSolve, for microbial gene tree reconstruction. TreeSolve works by collapsing weakly supported edges of the input gene tree, resulting in a non-binary gene tree, and then using new algorithms and techniques to optimally resolve the non-binary gene trees with respect to the given species tree in an appropriately and dynamically constrained search space. Using thousands of real and simulated gene trees, we demonstrate that TreeSolve significantly outperforms the best existing species-tree-aware methods for microbes in terms of accuracy, speed, or both. Crucially, TreeSolve also implicitly keeps track of multiple optimal gene tree reconstructions and can compute either a single best estimate of the gene tree or multiple distinct estimates. As we demonstrate, aggregating over multiple gene tree candidates helps distinguish between correct and incorrect parts of an error-corrected gene tree. Thus, TreeSolve not only enables rapid gene tree error-correction for large gene trees without compromising on accuracy, but also enables accounting of inference uncertainty. 2020-02-01 /pmc/articles/PMC7197061/ http://dx.doi.org/10.1007/978-3-030-42266-0_10 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Kordi, Misagh
Bansal, Mukul S.
TreeSolve: Rapid Error-Correction of Microbial Gene Trees
title TreeSolve: Rapid Error-Correction of Microbial Gene Trees
title_full TreeSolve: Rapid Error-Correction of Microbial Gene Trees
title_fullStr TreeSolve: Rapid Error-Correction of Microbial Gene Trees
title_full_unstemmed TreeSolve: Rapid Error-Correction of Microbial Gene Trees
title_short TreeSolve: Rapid Error-Correction of Microbial Gene Trees
title_sort treesolve: rapid error-correction of microbial gene trees
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7197061/
http://dx.doi.org/10.1007/978-3-030-42266-0_10
work_keys_str_mv AT kordimisagh treesolverapiderrorcorrectionofmicrobialgenetrees
AT bansalmukuls treesolverapiderrorcorrectionofmicrobialgenetrees