Cargando…

SOLiDzipper: A High Speed Encoding Method for the Next-Generation Sequencing Data

BACKGROUND: Next-generation sequencing (NGS) methods pose computational challenges of handling large volumes of data. Although cloud computing offers a potential solution to these challenges, transferring a large data set across the internet is the biggest obstacle, which may be overcome by efficien...

Descripción completa

Detalles Bibliográficos
Autores principales: Jeon, Young Jun, Park, Sang Hyun, Ahn, Sung Min, Hwang, Hee Joung
Formato: Texto
Lenguaje:English
Publicado: Libertas Academica 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3072624/
https://www.ncbi.nlm.nih.gov/pubmed/21487532
http://dx.doi.org/10.4137/EBO.S6618
_version_ 1782201577332801536
author Jeon, Young Jun
Park, Sang Hyun
Ahn, Sung Min
Hwang, Hee Joung
author_facet Jeon, Young Jun
Park, Sang Hyun
Ahn, Sung Min
Hwang, Hee Joung
author_sort Jeon, Young Jun
collection PubMed
description BACKGROUND: Next-generation sequencing (NGS) methods pose computational challenges of handling large volumes of data. Although cloud computing offers a potential solution to these challenges, transferring a large data set across the internet is the biggest obstacle, which may be overcome by efficient encoding methods. When encoding is used to facilitate data transfer to the cloud, the time factor is equally as important as the encoding efficiency. Moreover, to take advantage of parallel processing in cloud computing, a parallel technique to decode and split compressed data in the cloud is essential. Hence in this review, we present SOLiDzipper, a new encoding method for NGS data. METHODS: The basic strategy of SOLiDzipper is to divide and encode. NGS data files contain both the sequence and non-sequence information whose encoding efficiencies are different. In SOLiDzipper, encoded data are stored in binary data block that does not contain the characteristic information of a specific sequence platform, which means that data can be decoded according to a desired platform even in cases of Illumina, Solexa or Roche 454 data. RESULTS: The main calculation time using Crossbow was 173 minutes when 40 EC2 nodes were involved. In that case, an analysis preparation time of 464 minutes is required to encode data in the latest DNA compression method like G-SQZ and transmit it on a 183 Mbit/s bandwidth. However, it takes 194 minutes to encode and transmit data with SOLiDzipper under the same bandwidth conditions. These results indicate that the entire processing time can be reduced according to the encoding methods used, under the same network bandwidth conditions. Considering the limited network bandwidth, high-speed, high-efficiency encoding methods such as SOLiDzipper can make a significant contribution to higher productivity in labs seeking to take advantage of the cloud as an alternative to local computing. AVAILABILITY: http://szipper.dinfree.com. Academic/non-profit: Binary available for direct download at no cost. For-profit: Submit request for for-profit license from the web-site.
format Text
id pubmed-3072624
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-30726242011-04-12 SOLiDzipper: A High Speed Encoding Method for the Next-Generation Sequencing Data Jeon, Young Jun Park, Sang Hyun Ahn, Sung Min Hwang, Hee Joung Evol Bioinform Online Software or Database Review BACKGROUND: Next-generation sequencing (NGS) methods pose computational challenges of handling large volumes of data. Although cloud computing offers a potential solution to these challenges, transferring a large data set across the internet is the biggest obstacle, which may be overcome by efficient encoding methods. When encoding is used to facilitate data transfer to the cloud, the time factor is equally as important as the encoding efficiency. Moreover, to take advantage of parallel processing in cloud computing, a parallel technique to decode and split compressed data in the cloud is essential. Hence in this review, we present SOLiDzipper, a new encoding method for NGS data. METHODS: The basic strategy of SOLiDzipper is to divide and encode. NGS data files contain both the sequence and non-sequence information whose encoding efficiencies are different. In SOLiDzipper, encoded data are stored in binary data block that does not contain the characteristic information of a specific sequence platform, which means that data can be decoded according to a desired platform even in cases of Illumina, Solexa or Roche 454 data. RESULTS: The main calculation time using Crossbow was 173 minutes when 40 EC2 nodes were involved. In that case, an analysis preparation time of 464 minutes is required to encode data in the latest DNA compression method like G-SQZ and transmit it on a 183 Mbit/s bandwidth. However, it takes 194 minutes to encode and transmit data with SOLiDzipper under the same bandwidth conditions. These results indicate that the entire processing time can be reduced according to the encoding methods used, under the same network bandwidth conditions. Considering the limited network bandwidth, high-speed, high-efficiency encoding methods such as SOLiDzipper can make a significant contribution to higher productivity in labs seeking to take advantage of the cloud as an alternative to local computing. AVAILABILITY: http://szipper.dinfree.com. Academic/non-profit: Binary available for direct download at no cost. For-profit: Submit request for for-profit license from the web-site. Libertas Academica 2011-03-10 /pmc/articles/PMC3072624/ /pubmed/21487532 http://dx.doi.org/10.4137/EBO.S6618 Text en © the author(s), publisher and licensee Libertas Academica Ltd. This is an open access article. Unrestricted non-commercial use is permitted provided the original work is properly cited.
spellingShingle Software or Database Review
Jeon, Young Jun
Park, Sang Hyun
Ahn, Sung Min
Hwang, Hee Joung
SOLiDzipper: A High Speed Encoding Method for the Next-Generation Sequencing Data
title SOLiDzipper: A High Speed Encoding Method for the Next-Generation Sequencing Data
title_full SOLiDzipper: A High Speed Encoding Method for the Next-Generation Sequencing Data
title_fullStr SOLiDzipper: A High Speed Encoding Method for the Next-Generation Sequencing Data
title_full_unstemmed SOLiDzipper: A High Speed Encoding Method for the Next-Generation Sequencing Data
title_short SOLiDzipper: A High Speed Encoding Method for the Next-Generation Sequencing Data
title_sort solidzipper: a high speed encoding method for the next-generation sequencing data
topic Software or Database Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3072624/
https://www.ncbi.nlm.nih.gov/pubmed/21487532
http://dx.doi.org/10.4137/EBO.S6618
work_keys_str_mv AT jeonyoungjun solidzipperahighspeedencodingmethodforthenextgenerationsequencingdata
AT parksanghyun solidzipperahighspeedencodingmethodforthenextgenerationsequencingdata
AT ahnsungmin solidzipperahighspeedencodingmethodforthenextgenerationsequencingdata
AT hwangheejoung solidzipperahighspeedencodingmethodforthenextgenerationsequencingdata