Cargando…

RabbitKSSD: accelerating genome distance estimation on modern multi-core architectures

SUMMARY: We propose RabbitKSSD, a high-speed genome distance estimation tool. Specifically, we leverage load-balanced task partitioning, fast I/O, efficient intermediate result accesses, and high-performance data structures to improve overall efficiency. Our performance evaluation demonstrates that...

Descripción completa

Detalles Bibliográficos
Autores principales: Xu, Xiaoming, Yin, Zekun, Yan, Lifeng, Yi, Huiguang, Wang, Hua, Schmidt, Bertil, Liu, Weiguo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10681859/
https://www.ncbi.nlm.nih.gov/pubmed/37971961
http://dx.doi.org/10.1093/bioinformatics/btad695
_version_ 1785150847896780800
author Xu, Xiaoming
Yin, Zekun
Yan, Lifeng
Yi, Huiguang
Wang, Hua
Schmidt, Bertil
Liu, Weiguo
author_facet Xu, Xiaoming
Yin, Zekun
Yan, Lifeng
Yi, Huiguang
Wang, Hua
Schmidt, Bertil
Liu, Weiguo
author_sort Xu, Xiaoming
collection PubMed
description SUMMARY: We propose RabbitKSSD, a high-speed genome distance estimation tool. Specifically, we leverage load-balanced task partitioning, fast I/O, efficient intermediate result accesses, and high-performance data structures to improve overall efficiency. Our performance evaluation demonstrates that RabbitKSSD achieves speedups ranging from 5.7× to 19.8× over Kssd for the time-consuming sketch generation and distance computation on commonly used workstations. In addition, it significantly outperforms Mash, BinDash, and Dashing2. Moreover, RabbitKSSD can efficiently perform all-vs-all distance computation for all RefSeq complete bacterial genomes (455 GB in FASTA format) in just 2 min on a 64-core workstation. AVAILABILITY AND IMPLEMENTATION: RabbitKSSD is available at https://github.com/RabbitBio/RabbitKSSD.
format Online
Article
Text
id pubmed-10681859
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-106818592023-11-30 RabbitKSSD: accelerating genome distance estimation on modern multi-core architectures Xu, Xiaoming Yin, Zekun Yan, Lifeng Yi, Huiguang Wang, Hua Schmidt, Bertil Liu, Weiguo Bioinformatics Applications Note SUMMARY: We propose RabbitKSSD, a high-speed genome distance estimation tool. Specifically, we leverage load-balanced task partitioning, fast I/O, efficient intermediate result accesses, and high-performance data structures to improve overall efficiency. Our performance evaluation demonstrates that RabbitKSSD achieves speedups ranging from 5.7× to 19.8× over Kssd for the time-consuming sketch generation and distance computation on commonly used workstations. In addition, it significantly outperforms Mash, BinDash, and Dashing2. Moreover, RabbitKSSD can efficiently perform all-vs-all distance computation for all RefSeq complete bacterial genomes (455 GB in FASTA format) in just 2 min on a 64-core workstation. AVAILABILITY AND IMPLEMENTATION: RabbitKSSD is available at https://github.com/RabbitBio/RabbitKSSD. Oxford University Press 2023-11-16 /pmc/articles/PMC10681859/ /pubmed/37971961 http://dx.doi.org/10.1093/bioinformatics/btad695 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Note
Xu, Xiaoming
Yin, Zekun
Yan, Lifeng
Yi, Huiguang
Wang, Hua
Schmidt, Bertil
Liu, Weiguo
RabbitKSSD: accelerating genome distance estimation on modern multi-core architectures
title RabbitKSSD: accelerating genome distance estimation on modern multi-core architectures
title_full RabbitKSSD: accelerating genome distance estimation on modern multi-core architectures
title_fullStr RabbitKSSD: accelerating genome distance estimation on modern multi-core architectures
title_full_unstemmed RabbitKSSD: accelerating genome distance estimation on modern multi-core architectures
title_short RabbitKSSD: accelerating genome distance estimation on modern multi-core architectures
title_sort rabbitkssd: accelerating genome distance estimation on modern multi-core architectures
topic Applications Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10681859/
https://www.ncbi.nlm.nih.gov/pubmed/37971961
http://dx.doi.org/10.1093/bioinformatics/btad695
work_keys_str_mv AT xuxiaoming rabbitkssdacceleratinggenomedistanceestimationonmodernmulticorearchitectures
AT yinzekun rabbitkssdacceleratinggenomedistanceestimationonmodernmulticorearchitectures
AT yanlifeng rabbitkssdacceleratinggenomedistanceestimationonmodernmulticorearchitectures
AT yihuiguang rabbitkssdacceleratinggenomedistanceestimationonmodernmulticorearchitectures
AT wanghua rabbitkssdacceleratinggenomedistanceestimationonmodernmulticorearchitectures
AT schmidtbertil rabbitkssdacceleratinggenomedistanceestimationonmodernmulticorearchitectures
AT liuweiguo rabbitkssdacceleratinggenomedistanceestimationonmodernmulticorearchitectures