Cargando…

Efficient and Robust Search of Microbial Genomes via Phylogenetic Compression

Comprehensive collections approaching millions of sequenced genomes have become central information sources in the life sciences. However, the rapid growth of these collections makes it effectively impossible to search these data using tools such as BLAST and its successors. Here, we present a techn...

Descripción completa

Detalles Bibliográficos
Autores principales: Břinda, Karel, Lima, Leandro, Pignotti, Simone, Quinones-Olvera, Natalia, Salikhov, Kamil, Chikhi, Rayan, Kucherov, Gregory, Iqbal, Zamin, Baym, Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10153118/
https://www.ncbi.nlm.nih.gov/pubmed/37131636
http://dx.doi.org/10.1101/2023.04.15.536996
_version_ 1785035874714517504
author Břinda, Karel
Lima, Leandro
Pignotti, Simone
Quinones-Olvera, Natalia
Salikhov, Kamil
Chikhi, Rayan
Kucherov, Gregory
Iqbal, Zamin
Baym, Michael
author_facet Břinda, Karel
Lima, Leandro
Pignotti, Simone
Quinones-Olvera, Natalia
Salikhov, Kamil
Chikhi, Rayan
Kucherov, Gregory
Iqbal, Zamin
Baym, Michael
author_sort Břinda, Karel
collection PubMed
description Comprehensive collections approaching millions of sequenced genomes have become central information sources in the life sciences. However, the rapid growth of these collections makes it effectively impossible to search these data using tools such as BLAST and its successors. Here, we present a technique called phylogenetic compression, which uses evolutionary history to guide compression and efficiently search large collections of microbial genomes using existing algorithms and data structures. We show that, when applied to modern diverse collections approaching millions of genomes, lossless phylogenetic compression improves the compression ratios of assemblies, de Bruijn graphs, and k-mer indexes by one to two orders of magnitude. Additionally, we develop a pipeline for a BLAST-like search over these phylogeny-compressed reference data, and demonstrate it can align genes, plasmids, or entire sequencing experiments against all sequenced bacteria until 2019 on ordinary desktop computers within a few hours. Phylogenetic compression has broad applications in computational biology and may provide a fundamental design principle for future genomics infrastructure.
format Online
Article
Text
id pubmed-10153118
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-101531182023-05-03 Efficient and Robust Search of Microbial Genomes via Phylogenetic Compression Břinda, Karel Lima, Leandro Pignotti, Simone Quinones-Olvera, Natalia Salikhov, Kamil Chikhi, Rayan Kucherov, Gregory Iqbal, Zamin Baym, Michael bioRxiv Article Comprehensive collections approaching millions of sequenced genomes have become central information sources in the life sciences. However, the rapid growth of these collections makes it effectively impossible to search these data using tools such as BLAST and its successors. Here, we present a technique called phylogenetic compression, which uses evolutionary history to guide compression and efficiently search large collections of microbial genomes using existing algorithms and data structures. We show that, when applied to modern diverse collections approaching millions of genomes, lossless phylogenetic compression improves the compression ratios of assemblies, de Bruijn graphs, and k-mer indexes by one to two orders of magnitude. Additionally, we develop a pipeline for a BLAST-like search over these phylogeny-compressed reference data, and demonstrate it can align genes, plasmids, or entire sequencing experiments against all sequenced bacteria until 2019 on ordinary desktop computers within a few hours. Phylogenetic compression has broad applications in computational biology and may provide a fundamental design principle for future genomics infrastructure. Cold Spring Harbor Laboratory 2023-04-18 /pmc/articles/PMC10153118/ /pubmed/37131636 http://dx.doi.org/10.1101/2023.04.15.536996 Text en https://creativecommons.org/licenses/by-nc/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (https://creativecommons.org/licenses/by-nc/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
Břinda, Karel
Lima, Leandro
Pignotti, Simone
Quinones-Olvera, Natalia
Salikhov, Kamil
Chikhi, Rayan
Kucherov, Gregory
Iqbal, Zamin
Baym, Michael
Efficient and Robust Search of Microbial Genomes via Phylogenetic Compression
title Efficient and Robust Search of Microbial Genomes via Phylogenetic Compression
title_full Efficient and Robust Search of Microbial Genomes via Phylogenetic Compression
title_fullStr Efficient and Robust Search of Microbial Genomes via Phylogenetic Compression
title_full_unstemmed Efficient and Robust Search of Microbial Genomes via Phylogenetic Compression
title_short Efficient and Robust Search of Microbial Genomes via Phylogenetic Compression
title_sort efficient and robust search of microbial genomes via phylogenetic compression
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10153118/
https://www.ncbi.nlm.nih.gov/pubmed/37131636
http://dx.doi.org/10.1101/2023.04.15.536996
work_keys_str_mv AT brindakarel efficientandrobustsearchofmicrobialgenomesviaphylogeneticcompression
AT limaleandro efficientandrobustsearchofmicrobialgenomesviaphylogeneticcompression
AT pignottisimone efficientandrobustsearchofmicrobialgenomesviaphylogeneticcompression
AT quinonesolveranatalia efficientandrobustsearchofmicrobialgenomesviaphylogeneticcompression
AT salikhovkamil efficientandrobustsearchofmicrobialgenomesviaphylogeneticcompression
AT chikhirayan efficientandrobustsearchofmicrobialgenomesviaphylogeneticcompression
AT kucherovgregory efficientandrobustsearchofmicrobialgenomesviaphylogeneticcompression
AT iqbalzamin efficientandrobustsearchofmicrobialgenomesviaphylogeneticcompression
AT baymmichael efficientandrobustsearchofmicrobialgenomesviaphylogeneticcompression