Cargando…

Closha: bioinformatics workflow system for the analysis of massive sequencing data

BACKGROUND: While next-generation sequencing (NGS) costs have fallen in recent years, the cost and complexity of computation remain substantial obstacles to the use of NGS in bio-medical care and genomic research. The rapidly increasing amounts of data available from the new high-throughput methods...

Descripción completa

Detalles Bibliográficos
Autores principales: Ko, GunHwan, Kim, Pan-Gyu, Yoon, Jongcheol, Han, Gukhee, Park, Seong-Jin, Song, Wangho, Lee, Byungwook
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5836837/
https://www.ncbi.nlm.nih.gov/pubmed/29504905
http://dx.doi.org/10.1186/s12859-018-2019-3
_version_ 1783304014750810112
author Ko, GunHwan
Kim, Pan-Gyu
Yoon, Jongcheol
Han, Gukhee
Park, Seong-Jin
Song, Wangho
Lee, Byungwook
author_facet Ko, GunHwan
Kim, Pan-Gyu
Yoon, Jongcheol
Han, Gukhee
Park, Seong-Jin
Song, Wangho
Lee, Byungwook
author_sort Ko, GunHwan
collection PubMed
description BACKGROUND: While next-generation sequencing (NGS) costs have fallen in recent years, the cost and complexity of computation remain substantial obstacles to the use of NGS in bio-medical care and genomic research. The rapidly increasing amounts of data available from the new high-throughput methods have made data processing infeasible without automated pipelines. The integration of data and analytic resources into workflow systems provides a solution to the problem by simplifying the task of data analysis. RESULTS: To address this challenge, we developed a cloud-based workflow management system, Closha, to provide fast and cost-effective analysis of massive genomic data. We implemented complex workflows making optimal use of high-performance computing clusters. Closha allows users to create multi-step analyses using drag and drop functionality and to modify the parameters of pipeline tools. Users can also import the Galaxy pipelines into Closha. Closha is a hybrid system that enables users to use both analysis programs providing traditional tools and MapReduce-based big data analysis programs simultaneously in a single pipeline. Thus, the execution of analytics algorithms can be parallelized, speeding up the whole process. We also developed a high-speed data transmission solution, KoDS, to transmit a large amount of data at a fast rate. KoDS has a file transfer speed of up to 10 times that of normal FTP and HTTP. The computer hardware for Closha is 660 CPU cores and 800 TB of disk storage, enabling 500 jobs to run at the same time. CONCLUSIONS: Closha is a scalable, cost-effective, and publicly available web service for large-scale genomic data analysis. Closha supports the reliable and highly scalable execution of sequencing analysis workflows in a fully automated manner. Closha provides a user-friendly interface to all genomic scientists to try to derive accurate results from NGS platform data. The Closha cloud server is freely available for use from http://closha.kobic.re.kr/.
format Online
Article
Text
id pubmed-5836837
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-58368372018-03-07 Closha: bioinformatics workflow system for the analysis of massive sequencing data Ko, GunHwan Kim, Pan-Gyu Yoon, Jongcheol Han, Gukhee Park, Seong-Jin Song, Wangho Lee, Byungwook BMC Bioinformatics Methodology BACKGROUND: While next-generation sequencing (NGS) costs have fallen in recent years, the cost and complexity of computation remain substantial obstacles to the use of NGS in bio-medical care and genomic research. The rapidly increasing amounts of data available from the new high-throughput methods have made data processing infeasible without automated pipelines. The integration of data and analytic resources into workflow systems provides a solution to the problem by simplifying the task of data analysis. RESULTS: To address this challenge, we developed a cloud-based workflow management system, Closha, to provide fast and cost-effective analysis of massive genomic data. We implemented complex workflows making optimal use of high-performance computing clusters. Closha allows users to create multi-step analyses using drag and drop functionality and to modify the parameters of pipeline tools. Users can also import the Galaxy pipelines into Closha. Closha is a hybrid system that enables users to use both analysis programs providing traditional tools and MapReduce-based big data analysis programs simultaneously in a single pipeline. Thus, the execution of analytics algorithms can be parallelized, speeding up the whole process. We also developed a high-speed data transmission solution, KoDS, to transmit a large amount of data at a fast rate. KoDS has a file transfer speed of up to 10 times that of normal FTP and HTTP. The computer hardware for Closha is 660 CPU cores and 800 TB of disk storage, enabling 500 jobs to run at the same time. CONCLUSIONS: Closha is a scalable, cost-effective, and publicly available web service for large-scale genomic data analysis. Closha supports the reliable and highly scalable execution of sequencing analysis workflows in a fully automated manner. Closha provides a user-friendly interface to all genomic scientists to try to derive accurate results from NGS platform data. The Closha cloud server is freely available for use from http://closha.kobic.re.kr/. BioMed Central 2018-02-19 /pmc/articles/PMC5836837/ /pubmed/29504905 http://dx.doi.org/10.1186/s12859-018-2019-3 Text en © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology
Ko, GunHwan
Kim, Pan-Gyu
Yoon, Jongcheol
Han, Gukhee
Park, Seong-Jin
Song, Wangho
Lee, Byungwook
Closha: bioinformatics workflow system for the analysis of massive sequencing data
title Closha: bioinformatics workflow system for the analysis of massive sequencing data
title_full Closha: bioinformatics workflow system for the analysis of massive sequencing data
title_fullStr Closha: bioinformatics workflow system for the analysis of massive sequencing data
title_full_unstemmed Closha: bioinformatics workflow system for the analysis of massive sequencing data
title_short Closha: bioinformatics workflow system for the analysis of massive sequencing data
title_sort closha: bioinformatics workflow system for the analysis of massive sequencing data
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5836837/
https://www.ncbi.nlm.nih.gov/pubmed/29504905
http://dx.doi.org/10.1186/s12859-018-2019-3
work_keys_str_mv AT kogunhwan closhabioinformaticsworkflowsystemfortheanalysisofmassivesequencingdata
AT kimpangyu closhabioinformaticsworkflowsystemfortheanalysisofmassivesequencingdata
AT yoonjongcheol closhabioinformaticsworkflowsystemfortheanalysisofmassivesequencingdata
AT hangukhee closhabioinformaticsworkflowsystemfortheanalysisofmassivesequencingdata
AT parkseongjin closhabioinformaticsworkflowsystemfortheanalysisofmassivesequencingdata
AT songwangho closhabioinformaticsworkflowsystemfortheanalysisofmassivesequencingdata
AT leebyungwook closhabioinformaticsworkflowsystemfortheanalysisofmassivesequencingdata