Cargando…
Boiler: lossy compression of RNA-seq alignments using coverage vectors
We describe Boiler, a new software tool for compressing and querying large collections of RNA-seq alignments. Boiler discards most per-read data, keeping only a genomic coverage vector plus a few empirical distributions summarizing the alignments. Since most per-read data is discarded, storage footp...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5027496/ https://www.ncbi.nlm.nih.gov/pubmed/27298258 http://dx.doi.org/10.1093/nar/gkw540 |
_version_ | 1782454244881727488 |
---|---|
author | Pritt, Jacob Langmead, Ben |
author_facet | Pritt, Jacob Langmead, Ben |
author_sort | Pritt, Jacob |
collection | PubMed |
description | We describe Boiler, a new software tool for compressing and querying large collections of RNA-seq alignments. Boiler discards most per-read data, keeping only a genomic coverage vector plus a few empirical distributions summarizing the alignments. Since most per-read data is discarded, storage footprint is often much smaller than that achieved by other compression tools. Despite this, the most relevant per-read data can be recovered; we show that Boiler compression has only a slight negative impact on results given by downstream tools for isoform assembly and quantification. Boiler also allows the user to pose fast and useful queries without decompressing the entire file. Boiler is free open source software available from github.com/jpritt/boiler. |
format | Online Article Text |
id | pubmed-5027496 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-50274962016-09-21 Boiler: lossy compression of RNA-seq alignments using coverage vectors Pritt, Jacob Langmead, Ben Nucleic Acids Res Methods Online We describe Boiler, a new software tool for compressing and querying large collections of RNA-seq alignments. Boiler discards most per-read data, keeping only a genomic coverage vector plus a few empirical distributions summarizing the alignments. Since most per-read data is discarded, storage footprint is often much smaller than that achieved by other compression tools. Despite this, the most relevant per-read data can be recovered; we show that Boiler compression has only a slight negative impact on results given by downstream tools for isoform assembly and quantification. Boiler also allows the user to pose fast and useful queries without decompressing the entire file. Boiler is free open source software available from github.com/jpritt/boiler. Oxford University Press 2016-09-19 2016-06-13 /pmc/articles/PMC5027496/ /pubmed/27298258 http://dx.doi.org/10.1093/nar/gkw540 Text en © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Online Pritt, Jacob Langmead, Ben Boiler: lossy compression of RNA-seq alignments using coverage vectors |
title | Boiler: lossy compression of RNA-seq alignments using coverage vectors |
title_full | Boiler: lossy compression of RNA-seq alignments using coverage vectors |
title_fullStr | Boiler: lossy compression of RNA-seq alignments using coverage vectors |
title_full_unstemmed | Boiler: lossy compression of RNA-seq alignments using coverage vectors |
title_short | Boiler: lossy compression of RNA-seq alignments using coverage vectors |
title_sort | boiler: lossy compression of rna-seq alignments using coverage vectors |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5027496/ https://www.ncbi.nlm.nih.gov/pubmed/27298258 http://dx.doi.org/10.1093/nar/gkw540 |
work_keys_str_mv | AT prittjacob boilerlossycompressionofrnaseqalignmentsusingcoveragevectors AT langmeadben boilerlossycompressionofrnaseqalignmentsusingcoveragevectors |