Cargando…

IonCRAM: a reference-based compression tool for ion torrent sequence files

BACKGROUND: Ion Torrent is one of the major next generation sequencing (NGS) technologies and it is frequently used in medical research and diagnosis. The built-in software for the Ion Torrent sequencing machines delivers the sequencing results in the BAM format. In addition to the usual SAM/BAM fie...

Descripción completa

Detalles Bibliográficos
Autores principales: Shokrof, Moustafa, Abouelhoda, Mohamed
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7487613/
https://www.ncbi.nlm.nih.gov/pubmed/32907531
http://dx.doi.org/10.1186/s12859-020-03726-9
_version_ 1783581522507333632
author Shokrof, Moustafa
Abouelhoda, Mohamed
author_facet Shokrof, Moustafa
Abouelhoda, Mohamed
author_sort Shokrof, Moustafa
collection PubMed
description BACKGROUND: Ion Torrent is one of the major next generation sequencing (NGS) technologies and it is frequently used in medical research and diagnosis. The built-in software for the Ion Torrent sequencing machines delivers the sequencing results in the BAM format. In addition to the usual SAM/BAM fields, the Ion Torrent BAM file includes technology-specific flow signal data. The flow signals occupy a big portion of the BAM file (about 75% for the human genome). Compressing SAM/BAM into CRAM format significantly reduces the space needed to store the NGS results. However, the tools for generating the CRAM formats are not designed to handle the flow signals. This missing feature has motivated us to develop a new program to improve the compression of the Ion Torrent files for long term archiving. RESULTS: In this paper, we present IonCRAM, the first reference-based compression tool to compress Ion Torrent BAM files for long term archiving. For the BAM files, IonCRAM could achieve a space saving of about 43%. This space saving is superior to what achieved with the CRAM format by about 8–9%. CONCLUSIONS: Reducing the space consumption of NGS data reduces the cost of storage and data transfer. Therefore, developing efficient compression software for clinical NGS data goes beyond the computational interest; as it ultimately contributes to the overall cost reduction of the clinical test. The space saving achieved by our tool is a practical step in this direction. The tool is open source and available at Code Ocean, github, and http://ioncram.saudigenomeproject.com.
format Online
Article
Text
id pubmed-7487613
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-74876132020-09-16 IonCRAM: a reference-based compression tool for ion torrent sequence files Shokrof, Moustafa Abouelhoda, Mohamed BMC Bioinformatics Software BACKGROUND: Ion Torrent is one of the major next generation sequencing (NGS) technologies and it is frequently used in medical research and diagnosis. The built-in software for the Ion Torrent sequencing machines delivers the sequencing results in the BAM format. In addition to the usual SAM/BAM fields, the Ion Torrent BAM file includes technology-specific flow signal data. The flow signals occupy a big portion of the BAM file (about 75% for the human genome). Compressing SAM/BAM into CRAM format significantly reduces the space needed to store the NGS results. However, the tools for generating the CRAM formats are not designed to handle the flow signals. This missing feature has motivated us to develop a new program to improve the compression of the Ion Torrent files for long term archiving. RESULTS: In this paper, we present IonCRAM, the first reference-based compression tool to compress Ion Torrent BAM files for long term archiving. For the BAM files, IonCRAM could achieve a space saving of about 43%. This space saving is superior to what achieved with the CRAM format by about 8–9%. CONCLUSIONS: Reducing the space consumption of NGS data reduces the cost of storage and data transfer. Therefore, developing efficient compression software for clinical NGS data goes beyond the computational interest; as it ultimately contributes to the overall cost reduction of the clinical test. The space saving achieved by our tool is a practical step in this direction. The tool is open source and available at Code Ocean, github, and http://ioncram.saudigenomeproject.com. BioMed Central 2020-09-09 /pmc/articles/PMC7487613/ /pubmed/32907531 http://dx.doi.org/10.1186/s12859-020-03726-9 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Shokrof, Moustafa
Abouelhoda, Mohamed
IonCRAM: a reference-based compression tool for ion torrent sequence files
title IonCRAM: a reference-based compression tool for ion torrent sequence files
title_full IonCRAM: a reference-based compression tool for ion torrent sequence files
title_fullStr IonCRAM: a reference-based compression tool for ion torrent sequence files
title_full_unstemmed IonCRAM: a reference-based compression tool for ion torrent sequence files
title_short IonCRAM: a reference-based compression tool for ion torrent sequence files
title_sort ioncram: a reference-based compression tool for ion torrent sequence files
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7487613/
https://www.ncbi.nlm.nih.gov/pubmed/32907531
http://dx.doi.org/10.1186/s12859-020-03726-9
work_keys_str_mv AT shokrofmoustafa ioncramareferencebasedcompressiontoolforiontorrentsequencefiles
AT abouelhodamohamed ioncramareferencebasedcompressiontoolforiontorrentsequencefiles