Cargando…

Ultra-fast search of all deposited bacterial and viral genomic data

Exponentially increasing amounts of unprocessed bacterial and viral genomic sequence data are stored in the global archives. The ability to query these data for sequence search-terms would facilitate both basic research and applications such as real-time genomic epidemiology and surveillance. Howeve...

Descripción completa

Detalles Bibliográficos
Autores principales: Bradley, Phelim, Den Bakker, Henk C, Rocha, Eduardo P. C., McVean, Gil, Iqbal, Zamin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6420049/
https://www.ncbi.nlm.nih.gov/pubmed/30718882
http://dx.doi.org/10.1038/s41587-018-0010-1
_version_ 1783404049526161408
author Bradley, Phelim
Den Bakker, Henk C
Rocha, Eduardo P. C.
McVean, Gil
Iqbal, Zamin
author_facet Bradley, Phelim
Den Bakker, Henk C
Rocha, Eduardo P. C.
McVean, Gil
Iqbal, Zamin
author_sort Bradley, Phelim
collection PubMed
description Exponentially increasing amounts of unprocessed bacterial and viral genomic sequence data are stored in the global archives. The ability to query these data for sequence search-terms would facilitate both basic research and applications such as real-time genomic epidemiology and surveillance. However, this is not possible with current methods. To solve this problem, we combine knowledge of microbial population genomics with computational methods devised for web-search to produce a searchable data structure named Bitsliced Genomic Signature Index (BIGSI). We indexed the entire global corpus of 447,833 bacterial and viral whole genome sequence datasets using 4 orders of magnitude less storage than previous methods. We applied our BIGSI search function to rapidly find resistance genes MCR-1/2/3, determine the host-range of 2827 plasmids, and quantify antibiotic resistance in archived datasets. Our index can grow incrementally as new (unprocessed or assembled) sequence datasets are deposited and can scale to millions of datasets.
format Online
Article
Text
id pubmed-6420049
institution National Center for Biotechnology Information
language English
publishDate 2019
record_format MEDLINE/PubMed
spelling pubmed-64200492019-08-04 Ultra-fast search of all deposited bacterial and viral genomic data Bradley, Phelim Den Bakker, Henk C Rocha, Eduardo P. C. McVean, Gil Iqbal, Zamin Nat Biotechnol Article Exponentially increasing amounts of unprocessed bacterial and viral genomic sequence data are stored in the global archives. The ability to query these data for sequence search-terms would facilitate both basic research and applications such as real-time genomic epidemiology and surveillance. However, this is not possible with current methods. To solve this problem, we combine knowledge of microbial population genomics with computational methods devised for web-search to produce a searchable data structure named Bitsliced Genomic Signature Index (BIGSI). We indexed the entire global corpus of 447,833 bacterial and viral whole genome sequence datasets using 4 orders of magnitude less storage than previous methods. We applied our BIGSI search function to rapidly find resistance genes MCR-1/2/3, determine the host-range of 2827 plasmids, and quantify antibiotic resistance in archived datasets. Our index can grow incrementally as new (unprocessed or assembled) sequence datasets are deposited and can scale to millions of datasets. 2019-02-04 2019-02 /pmc/articles/PMC6420049/ /pubmed/30718882 http://dx.doi.org/10.1038/s41587-018-0010-1 Text en Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:http://www.nature.com/authors/editorial_policies/license.html#terms
spellingShingle Article
Bradley, Phelim
Den Bakker, Henk C
Rocha, Eduardo P. C.
McVean, Gil
Iqbal, Zamin
Ultra-fast search of all deposited bacterial and viral genomic data
title Ultra-fast search of all deposited bacterial and viral genomic data
title_full Ultra-fast search of all deposited bacterial and viral genomic data
title_fullStr Ultra-fast search of all deposited bacterial and viral genomic data
title_full_unstemmed Ultra-fast search of all deposited bacterial and viral genomic data
title_short Ultra-fast search of all deposited bacterial and viral genomic data
title_sort ultra-fast search of all deposited bacterial and viral genomic data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6420049/
https://www.ncbi.nlm.nih.gov/pubmed/30718882
http://dx.doi.org/10.1038/s41587-018-0010-1
work_keys_str_mv AT bradleyphelim ultrafastsearchofalldepositedbacterialandviralgenomicdata
AT denbakkerhenkc ultrafastsearchofalldepositedbacterialandviralgenomicdata
AT rochaeduardopc ultrafastsearchofalldepositedbacterialandviralgenomicdata
AT mcveangil ultrafastsearchofalldepositedbacterialandviralgenomicdata
AT iqbalzamin ultrafastsearchofalldepositedbacterialandviralgenomicdata