Cargando…
phyBWT2: phylogeny reconstruction via eBWT positional clustering
BACKGROUND: Molecular phylogenetics studies the evolutionary relationships among the individuals of a population through their biological sequences. It may provide insights about the origin and the evolution of viral diseases, or highlight complex evolutionary trajectories. A key task is inferring p...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10399073/ https://www.ncbi.nlm.nih.gov/pubmed/37537624 http://dx.doi.org/10.1186/s13015-023-00232-4 |
_version_ | 1785084193763491840 |
---|---|
author | Guerrini, Veronica Conte, Alessio Grossi, Roberto Liti, Gianni Rosone, Giovanna Tattini, Lorenzo |
author_facet | Guerrini, Veronica Conte, Alessio Grossi, Roberto Liti, Gianni Rosone, Giovanna Tattini, Lorenzo |
author_sort | Guerrini, Veronica |
collection | PubMed |
description | BACKGROUND: Molecular phylogenetics studies the evolutionary relationships among the individuals of a population through their biological sequences. It may provide insights about the origin and the evolution of viral diseases, or highlight complex evolutionary trajectories. A key task is inferring phylogenetic trees from any type of sequencing data, including raw short reads. Yet, several tools require pre-processed input data e.g. from complex computational pipelines based on de novo assembly or from mappings against a reference genome. As sequencing technologies keep becoming cheaper, this puts increasing pressure on designing methods that perform analysis directly on their outputs. From this viewpoint, there is a growing interest in alignment-, assembly-, and reference-free methods that could work on several data including raw reads data. RESULTS: We present phyBWT2, a newly improved version of phyBWT (Guerrini et al. in 22nd International Workshop on Algorithms in Bioinformatics (WABI) 242:23–12319, 2022). Both of them directly reconstruct phylogenetic trees bypassing both the alignment against a reference genome and de novo assembly. They exploit the combinatorial properties of the extended Burrows-Wheeler Transform (eBWT) and the corresponding eBWT positional clustering framework to detect relevant blocks of the longest shared substrings of varying length (unlike the k-mer-based approaches that need to fix the length k a priori). As a result, they provide novel alignment-, assembly-, and reference-free methods that build partition trees without relying on the pairwise comparison of sequences, thus avoiding to use a distance matrix to infer phylogeny. In addition, phyBWT2 outperforms phyBWT in terms of running time, as the former reconstructs phylogenetic trees step-by-step by considering multiple partitions, instead of just one partition at a time, as previously done by the latter. CONCLUSIONS: Based on the results of the experiments on sequencing data, we conclude that our method can produce trees of quality comparable to the benchmark phylogeny by handling datasets of different types (short reads, contigs, or entire genomes). Overall, the experiments confirm the effectiveness of phyBWT2 that improves the performance of its previous version phyBWT, while preserving the accuracy of the results. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13015-023-00232-4. |
format | Online Article Text |
id | pubmed-10399073 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-103990732023-08-04 phyBWT2: phylogeny reconstruction via eBWT positional clustering Guerrini, Veronica Conte, Alessio Grossi, Roberto Liti, Gianni Rosone, Giovanna Tattini, Lorenzo Algorithms Mol Biol Research BACKGROUND: Molecular phylogenetics studies the evolutionary relationships among the individuals of a population through their biological sequences. It may provide insights about the origin and the evolution of viral diseases, or highlight complex evolutionary trajectories. A key task is inferring phylogenetic trees from any type of sequencing data, including raw short reads. Yet, several tools require pre-processed input data e.g. from complex computational pipelines based on de novo assembly or from mappings against a reference genome. As sequencing technologies keep becoming cheaper, this puts increasing pressure on designing methods that perform analysis directly on their outputs. From this viewpoint, there is a growing interest in alignment-, assembly-, and reference-free methods that could work on several data including raw reads data. RESULTS: We present phyBWT2, a newly improved version of phyBWT (Guerrini et al. in 22nd International Workshop on Algorithms in Bioinformatics (WABI) 242:23–12319, 2022). Both of them directly reconstruct phylogenetic trees bypassing both the alignment against a reference genome and de novo assembly. They exploit the combinatorial properties of the extended Burrows-Wheeler Transform (eBWT) and the corresponding eBWT positional clustering framework to detect relevant blocks of the longest shared substrings of varying length (unlike the k-mer-based approaches that need to fix the length k a priori). As a result, they provide novel alignment-, assembly-, and reference-free methods that build partition trees without relying on the pairwise comparison of sequences, thus avoiding to use a distance matrix to infer phylogeny. In addition, phyBWT2 outperforms phyBWT in terms of running time, as the former reconstructs phylogenetic trees step-by-step by considering multiple partitions, instead of just one partition at a time, as previously done by the latter. CONCLUSIONS: Based on the results of the experiments on sequencing data, we conclude that our method can produce trees of quality comparable to the benchmark phylogeny by handling datasets of different types (short reads, contigs, or entire genomes). Overall, the experiments confirm the effectiveness of phyBWT2 that improves the performance of its previous version phyBWT, while preserving the accuracy of the results. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13015-023-00232-4. BioMed Central 2023-08-03 /pmc/articles/PMC10399073/ /pubmed/37537624 http://dx.doi.org/10.1186/s13015-023-00232-4 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Guerrini, Veronica Conte, Alessio Grossi, Roberto Liti, Gianni Rosone, Giovanna Tattini, Lorenzo phyBWT2: phylogeny reconstruction via eBWT positional clustering |
title | phyBWT2: phylogeny reconstruction via eBWT positional clustering |
title_full | phyBWT2: phylogeny reconstruction via eBWT positional clustering |
title_fullStr | phyBWT2: phylogeny reconstruction via eBWT positional clustering |
title_full_unstemmed | phyBWT2: phylogeny reconstruction via eBWT positional clustering |
title_short | phyBWT2: phylogeny reconstruction via eBWT positional clustering |
title_sort | phybwt2: phylogeny reconstruction via ebwt positional clustering |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10399073/ https://www.ncbi.nlm.nih.gov/pubmed/37537624 http://dx.doi.org/10.1186/s13015-023-00232-4 |
work_keys_str_mv | AT guerriniveronica phybwt2phylogenyreconstructionviaebwtpositionalclustering AT contealessio phybwt2phylogenyreconstructionviaebwtpositionalclustering AT grossiroberto phybwt2phylogenyreconstructionviaebwtpositionalclustering AT litigianni phybwt2phylogenyreconstructionviaebwtpositionalclustering AT rosonegiovanna phybwt2phylogenyreconstructionviaebwtpositionalclustering AT tattinilorenzo phybwt2phylogenyreconstructionviaebwtpositionalclustering |