Cargando…

quickBAM: a parallelized BAM file access API for high-throughput sequence analysis informatics

MOTIVATION: In time-critical clinical settings, such as precision medicine, genomic data needs to be processed as fast as possible to arrive at data-informed treatment decisions in a timely fashion. While sequencing throughput has dramatically increased over the past decade, bioinformatics analysis...

Descripción completa

Detalles Bibliográficos
Autores principales: Pitman, Anders, Huang, Xiaomeng, Marth, Gabor T, Qiao, Yi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10412403/
https://www.ncbi.nlm.nih.gov/pubmed/37498562
http://dx.doi.org/10.1093/bioinformatics/btad463
_version_ 1785086898137464832
author Pitman, Anders
Huang, Xiaomeng
Marth, Gabor T
Qiao, Yi
author_facet Pitman, Anders
Huang, Xiaomeng
Marth, Gabor T
Qiao, Yi
author_sort Pitman, Anders
collection PubMed
description MOTIVATION: In time-critical clinical settings, such as precision medicine, genomic data needs to be processed as fast as possible to arrive at data-informed treatment decisions in a timely fashion. While sequencing throughput has dramatically increased over the past decade, bioinformatics analysis throughput has not been able to keep up with the pace of computer hardware improvement, and consequently has now turned into the primary bottleneck. Modern computer hardware today is capable of much higher performance than current genomic informatics algorithms can typically utilize, therefore presenting opportunities for significant improvement of performance. Accessing the raw sequencing data from BAM files, e.g. is a necessary and time-consuming step in nearly all sequence analysis tools, however existing programming libraries for BAM access do not take full advantage of the parallel input/output capabilities of storage devices. RESULTS: In an effort to stimulate the development of a new generation of faster sequence analysis tools, we developed quickBAM, a software library to accelerate sequencing data access by exploiting the parallelism in commodity storage hardware currently widely available. We demonstrate that analysis software ported to quickBAM consistently outperforms their current versions, in some cases finishing an analysis in under 3 min while the original version took 1.5 h, using the same storage solution. AVAILABILITY AND IMPLEMENTATION: Open source and freely available at https://gitlab.com/yiq/quickbam/, we envision that quickBAM will enable a new generation of high-performance informatics tools, either directly boosting their performance if they are currently data-access bottlenecked, or allow data-access to keep up with further optimizations in algorithms and compute techniques.
format Online
Article
Text
id pubmed-10412403
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-104124032023-08-11 quickBAM: a parallelized BAM file access API for high-throughput sequence analysis informatics Pitman, Anders Huang, Xiaomeng Marth, Gabor T Qiao, Yi Bioinformatics Original Paper MOTIVATION: In time-critical clinical settings, such as precision medicine, genomic data needs to be processed as fast as possible to arrive at data-informed treatment decisions in a timely fashion. While sequencing throughput has dramatically increased over the past decade, bioinformatics analysis throughput has not been able to keep up with the pace of computer hardware improvement, and consequently has now turned into the primary bottleneck. Modern computer hardware today is capable of much higher performance than current genomic informatics algorithms can typically utilize, therefore presenting opportunities for significant improvement of performance. Accessing the raw sequencing data from BAM files, e.g. is a necessary and time-consuming step in nearly all sequence analysis tools, however existing programming libraries for BAM access do not take full advantage of the parallel input/output capabilities of storage devices. RESULTS: In an effort to stimulate the development of a new generation of faster sequence analysis tools, we developed quickBAM, a software library to accelerate sequencing data access by exploiting the parallelism in commodity storage hardware currently widely available. We demonstrate that analysis software ported to quickBAM consistently outperforms their current versions, in some cases finishing an analysis in under 3 min while the original version took 1.5 h, using the same storage solution. AVAILABILITY AND IMPLEMENTATION: Open source and freely available at https://gitlab.com/yiq/quickbam/, we envision that quickBAM will enable a new generation of high-performance informatics tools, either directly boosting their performance if they are currently data-access bottlenecked, or allow data-access to keep up with further optimizations in algorithms and compute techniques. Oxford University Press 2023-07-27 /pmc/articles/PMC10412403/ /pubmed/37498562 http://dx.doi.org/10.1093/bioinformatics/btad463 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Pitman, Anders
Huang, Xiaomeng
Marth, Gabor T
Qiao, Yi
quickBAM: a parallelized BAM file access API for high-throughput sequence analysis informatics
title quickBAM: a parallelized BAM file access API for high-throughput sequence analysis informatics
title_full quickBAM: a parallelized BAM file access API for high-throughput sequence analysis informatics
title_fullStr quickBAM: a parallelized BAM file access API for high-throughput sequence analysis informatics
title_full_unstemmed quickBAM: a parallelized BAM file access API for high-throughput sequence analysis informatics
title_short quickBAM: a parallelized BAM file access API for high-throughput sequence analysis informatics
title_sort quickbam: a parallelized bam file access api for high-throughput sequence analysis informatics
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10412403/
https://www.ncbi.nlm.nih.gov/pubmed/37498562
http://dx.doi.org/10.1093/bioinformatics/btad463
work_keys_str_mv AT pitmananders quickbamaparallelizedbamfileaccessapiforhighthroughputsequenceanalysisinformatics
AT huangxiaomeng quickbamaparallelizedbamfileaccessapiforhighthroughputsequenceanalysisinformatics
AT marthgabort quickbamaparallelizedbamfileaccessapiforhighthroughputsequenceanalysisinformatics
AT qiaoyi quickbamaparallelizedbamfileaccessapiforhighthroughputsequenceanalysisinformatics