Cargando…

Oculus: faster sequence alignment by streaming read compression

BACKGROUND: Despite significant advancement in alignment algorithms, the exponential growth of nucleotide sequencing throughput threatens to outpace bioinformatic analysis. Computation may become the bottleneck of genome analysis if growing alignment costs are not mitigated by further improvement in...

Descripción completa

Detalles Bibliográficos
Autores principales: Veeneman, Brendan A, Iyer, Matthew K, Chinnaiyan, Arul M
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3534618/
https://www.ncbi.nlm.nih.gov/pubmed/23148484
http://dx.doi.org/10.1186/1471-2105-13-297
_version_ 1782475367957659648
author Veeneman, Brendan A
Iyer, Matthew K
Chinnaiyan, Arul M
author_facet Veeneman, Brendan A
Iyer, Matthew K
Chinnaiyan, Arul M
author_sort Veeneman, Brendan A
collection PubMed
description BACKGROUND: Despite significant advancement in alignment algorithms, the exponential growth of nucleotide sequencing throughput threatens to outpace bioinformatic analysis. Computation may become the bottleneck of genome analysis if growing alignment costs are not mitigated by further improvement in algorithms. Much gain has been gleaned from indexing and compressing alignment databases, but many widely used alignment tools process input reads sequentially and are oblivious to any underlying redundancy in the reads themselves. RESULTS: Here we present Oculus, a software package that attaches to standard aligners and exploits read redundancy by performing streaming compression, alignment, and decompression of input sequences. This nearly lossless process (> 99.9%) led to alignment speedups of up to 270% across a variety of data sets, while requiring a modest amount of memory. We expect that streaming read compressors such as Oculus could become a standard addition to existing RNA-Seq and ChIP-Seq alignment pipelines, and potentially other applications in the future as throughput increases. CONCLUSIONS: Oculus efficiently condenses redundant input reads and wraps existing aligners to provide nearly identical SAM output in a fraction of the aligner runtime. It includes a number of useful features, such as tunable performance and fidelity options, compatibility with FASTA or FASTQ files, and adherence to the SAM format. The platform-independent C++ source code is freely available online, at http://code.google.com/p/oculus-bio.
format Online
Article
Text
id pubmed-3534618
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35346182013-01-03 Oculus: faster sequence alignment by streaming read compression Veeneman, Brendan A Iyer, Matthew K Chinnaiyan, Arul M BMC Bioinformatics Software BACKGROUND: Despite significant advancement in alignment algorithms, the exponential growth of nucleotide sequencing throughput threatens to outpace bioinformatic analysis. Computation may become the bottleneck of genome analysis if growing alignment costs are not mitigated by further improvement in algorithms. Much gain has been gleaned from indexing and compressing alignment databases, but many widely used alignment tools process input reads sequentially and are oblivious to any underlying redundancy in the reads themselves. RESULTS: Here we present Oculus, a software package that attaches to standard aligners and exploits read redundancy by performing streaming compression, alignment, and decompression of input sequences. This nearly lossless process (> 99.9%) led to alignment speedups of up to 270% across a variety of data sets, while requiring a modest amount of memory. We expect that streaming read compressors such as Oculus could become a standard addition to existing RNA-Seq and ChIP-Seq alignment pipelines, and potentially other applications in the future as throughput increases. CONCLUSIONS: Oculus efficiently condenses redundant input reads and wraps existing aligners to provide nearly identical SAM output in a fraction of the aligner runtime. It includes a number of useful features, such as tunable performance and fidelity options, compatibility with FASTA or FASTQ files, and adherence to the SAM format. The platform-independent C++ source code is freely available online, at http://code.google.com/p/oculus-bio. BioMed Central 2012-11-13 /pmc/articles/PMC3534618/ /pubmed/23148484 http://dx.doi.org/10.1186/1471-2105-13-297 Text en Copyright ©2012 Veeneman et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Veeneman, Brendan A
Iyer, Matthew K
Chinnaiyan, Arul M
Oculus: faster sequence alignment by streaming read compression
title Oculus: faster sequence alignment by streaming read compression
title_full Oculus: faster sequence alignment by streaming read compression
title_fullStr Oculus: faster sequence alignment by streaming read compression
title_full_unstemmed Oculus: faster sequence alignment by streaming read compression
title_short Oculus: faster sequence alignment by streaming read compression
title_sort oculus: faster sequence alignment by streaming read compression
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3534618/
https://www.ncbi.nlm.nih.gov/pubmed/23148484
http://dx.doi.org/10.1186/1471-2105-13-297
work_keys_str_mv AT veenemanbrendana oculusfastersequencealignmentbystreamingreadcompression
AT iyermatthewk oculusfastersequencealignmentbystreamingreadcompression
AT chinnaiyanarulm oculusfastersequencealignmentbystreamingreadcompression