Cargando…
TreeSolve: Rapid Error-Correction of Microbial Gene Trees
Gene tree reconstruction is an important problem in phylogenetics. However, gene sequences often lack sufficient information to confidently distinguish between competing gene tree topologies. To overcome this limitation, the best gene tree reconstruction methods use a known species tree topology to...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7197061/ http://dx.doi.org/10.1007/978-3-030-42266-0_10 |
_version_ | 1783528809388048384 |
---|---|
author | Kordi, Misagh Bansal, Mukul S. |
author_facet | Kordi, Misagh Bansal, Mukul S. |
author_sort | Kordi, Misagh |
collection | PubMed |
description | Gene tree reconstruction is an important problem in phylogenetics. However, gene sequences often lack sufficient information to confidently distinguish between competing gene tree topologies. To overcome this limitation, the best gene tree reconstruction methods use a known species tree topology to guide the reconstruction of the gene tree. While such species-tree-aware gene tree reconstruction methods have been repeatedly shown to result in vastly more accurate gene trees, the most accurate of these methods often have prohibitively high computational costs. In this work, we introduce a highly computationally efficient and robust species-tree-aware method, named TreeSolve, for microbial gene tree reconstruction. TreeSolve works by collapsing weakly supported edges of the input gene tree, resulting in a non-binary gene tree, and then using new algorithms and techniques to optimally resolve the non-binary gene trees with respect to the given species tree in an appropriately and dynamically constrained search space. Using thousands of real and simulated gene trees, we demonstrate that TreeSolve significantly outperforms the best existing species-tree-aware methods for microbes in terms of accuracy, speed, or both. Crucially, TreeSolve also implicitly keeps track of multiple optimal gene tree reconstructions and can compute either a single best estimate of the gene tree or multiple distinct estimates. As we demonstrate, aggregating over multiple gene tree candidates helps distinguish between correct and incorrect parts of an error-corrected gene tree. Thus, TreeSolve not only enables rapid gene tree error-correction for large gene trees without compromising on accuracy, but also enables accounting of inference uncertainty. |
format | Online Article Text |
id | pubmed-7197061 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-71970612020-05-04 TreeSolve: Rapid Error-Correction of Microbial Gene Trees Kordi, Misagh Bansal, Mukul S. Algorithms for Computational Biology Article Gene tree reconstruction is an important problem in phylogenetics. However, gene sequences often lack sufficient information to confidently distinguish between competing gene tree topologies. To overcome this limitation, the best gene tree reconstruction methods use a known species tree topology to guide the reconstruction of the gene tree. While such species-tree-aware gene tree reconstruction methods have been repeatedly shown to result in vastly more accurate gene trees, the most accurate of these methods often have prohibitively high computational costs. In this work, we introduce a highly computationally efficient and robust species-tree-aware method, named TreeSolve, for microbial gene tree reconstruction. TreeSolve works by collapsing weakly supported edges of the input gene tree, resulting in a non-binary gene tree, and then using new algorithms and techniques to optimally resolve the non-binary gene trees with respect to the given species tree in an appropriately and dynamically constrained search space. Using thousands of real and simulated gene trees, we demonstrate that TreeSolve significantly outperforms the best existing species-tree-aware methods for microbes in terms of accuracy, speed, or both. Crucially, TreeSolve also implicitly keeps track of multiple optimal gene tree reconstructions and can compute either a single best estimate of the gene tree or multiple distinct estimates. As we demonstrate, aggregating over multiple gene tree candidates helps distinguish between correct and incorrect parts of an error-corrected gene tree. Thus, TreeSolve not only enables rapid gene tree error-correction for large gene trees without compromising on accuracy, but also enables accounting of inference uncertainty. 2020-02-01 /pmc/articles/PMC7197061/ http://dx.doi.org/10.1007/978-3-030-42266-0_10 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article Kordi, Misagh Bansal, Mukul S. TreeSolve: Rapid Error-Correction of Microbial Gene Trees |
title | TreeSolve: Rapid Error-Correction of Microbial Gene Trees |
title_full | TreeSolve: Rapid Error-Correction of Microbial Gene Trees |
title_fullStr | TreeSolve: Rapid Error-Correction of Microbial Gene Trees |
title_full_unstemmed | TreeSolve: Rapid Error-Correction of Microbial Gene Trees |
title_short | TreeSolve: Rapid Error-Correction of Microbial Gene Trees |
title_sort | treesolve: rapid error-correction of microbial gene trees |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7197061/ http://dx.doi.org/10.1007/978-3-030-42266-0_10 |
work_keys_str_mv | AT kordimisagh treesolverapiderrorcorrectionofmicrobialgenetrees AT bansalmukuls treesolverapiderrorcorrectionofmicrobialgenetrees |