Cargando…
Genome-wide alignment-free phylogenetic distance estimation under a no strand-bias model
: Summary: While alignment has been the dominant approach for determining homology prior to phylogenetic inference, alignment-free methods can simplify the analysis, especially when analyzing genome-wide data. Furthermore, alignment-free methods present the only option for emerging forms of data, s...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9383262/ https://www.ncbi.nlm.nih.gov/pubmed/35992043 http://dx.doi.org/10.1093/bioadv/vbac055 |
_version_ | 1784769388670353408 |
---|---|
author | Balaban, Metin Bristy, Nishat Anjum Faisal, Ahnaf Bayzid, Md Shamsuzzoha Mirarab, Siavash |
author_facet | Balaban, Metin Bristy, Nishat Anjum Faisal, Ahnaf Bayzid, Md Shamsuzzoha Mirarab, Siavash |
author_sort | Balaban, Metin |
collection | PubMed |
description | : Summary: While alignment has been the dominant approach for determining homology prior to phylogenetic inference, alignment-free methods can simplify the analysis, especially when analyzing genome-wide data. Furthermore, alignment-free methods present the only option for emerging forms of data, such as genome skims, which do not permit assembly. Despite the appeal, alignment-free methods have not been competitive with alignment-based methods in terms of accuracy. One limitation of alignment-free methods is their reliance on simplified models of sequence evolution such as Jukes–Cantor. If we can estimate frequencies of base substitutions in an alignment-free setting, we can compute pairwise distances under more complex models. However, since the strand of DNA sequences is unknown for many forms of genome-wide data, which arguably present the best use case for alignment-free methods, the most complex models that one can use are the so-called no strand-bias models. We show how to calculate distances under a four-parameter no strand-bias model called TK4 without relying on alignments or assemblies. The main idea is to replace letters in the input sequences and recompute Jaccard indices between k-mer sets. However, on larger genomes, we also need to compute the number of k-mer mismatches after replacement due to random chance as opposed to homology. We show in simulation that alignment-free distances can be highly accurate when genomes evolve under the assumed models and study the accuracy on assembled and unassembled biological data. AVAILABILITY AND IMPLEMENTATION: Our software is available open source at https://github.com/nishatbristy007/NSB. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. |
format | Online Article Text |
id | pubmed-9383262 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-93832622022-08-18 Genome-wide alignment-free phylogenetic distance estimation under a no strand-bias model Balaban, Metin Bristy, Nishat Anjum Faisal, Ahnaf Bayzid, Md Shamsuzzoha Mirarab, Siavash Bioinform Adv Original Paper : Summary: While alignment has been the dominant approach for determining homology prior to phylogenetic inference, alignment-free methods can simplify the analysis, especially when analyzing genome-wide data. Furthermore, alignment-free methods present the only option for emerging forms of data, such as genome skims, which do not permit assembly. Despite the appeal, alignment-free methods have not been competitive with alignment-based methods in terms of accuracy. One limitation of alignment-free methods is their reliance on simplified models of sequence evolution such as Jukes–Cantor. If we can estimate frequencies of base substitutions in an alignment-free setting, we can compute pairwise distances under more complex models. However, since the strand of DNA sequences is unknown for many forms of genome-wide data, which arguably present the best use case for alignment-free methods, the most complex models that one can use are the so-called no strand-bias models. We show how to calculate distances under a four-parameter no strand-bias model called TK4 without relying on alignments or assemblies. The main idea is to replace letters in the input sequences and recompute Jaccard indices between k-mer sets. However, on larger genomes, we also need to compute the number of k-mer mismatches after replacement due to random chance as opposed to homology. We show in simulation that alignment-free distances can be highly accurate when genomes evolve under the assumed models and study the accuracy on assembled and unassembled biological data. AVAILABILITY AND IMPLEMENTATION: Our software is available open source at https://github.com/nishatbristy007/NSB. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. Oxford University Press 2022-08-12 /pmc/articles/PMC9383262/ /pubmed/35992043 http://dx.doi.org/10.1093/bioadv/vbac055 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Paper Balaban, Metin Bristy, Nishat Anjum Faisal, Ahnaf Bayzid, Md Shamsuzzoha Mirarab, Siavash Genome-wide alignment-free phylogenetic distance estimation under a no strand-bias model |
title | Genome-wide alignment-free phylogenetic distance estimation under a no strand-bias model |
title_full | Genome-wide alignment-free phylogenetic distance estimation under a no strand-bias model |
title_fullStr | Genome-wide alignment-free phylogenetic distance estimation under a no strand-bias model |
title_full_unstemmed | Genome-wide alignment-free phylogenetic distance estimation under a no strand-bias model |
title_short | Genome-wide alignment-free phylogenetic distance estimation under a no strand-bias model |
title_sort | genome-wide alignment-free phylogenetic distance estimation under a no strand-bias model |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9383262/ https://www.ncbi.nlm.nih.gov/pubmed/35992043 http://dx.doi.org/10.1093/bioadv/vbac055 |
work_keys_str_mv | AT balabanmetin genomewidealignmentfreephylogeneticdistanceestimationunderanostrandbiasmodel AT bristynishatanjum genomewidealignmentfreephylogeneticdistanceestimationunderanostrandbiasmodel AT faisalahnaf genomewidealignmentfreephylogeneticdistanceestimationunderanostrandbiasmodel AT bayzidmdshamsuzzoha genomewidealignmentfreephylogeneticdistanceestimationunderanostrandbiasmodel AT mirarabsiavash genomewidealignmentfreephylogeneticdistanceestimationunderanostrandbiasmodel |