Cargando…
Closha: bioinformatics workflow system for the analysis of massive sequencing data
BACKGROUND: While next-generation sequencing (NGS) costs have fallen in recent years, the cost and complexity of computation remain substantial obstacles to the use of NGS in bio-medical care and genomic research. The rapidly increasing amounts of data available from the new high-throughput methods...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5836837/ https://www.ncbi.nlm.nih.gov/pubmed/29504905 http://dx.doi.org/10.1186/s12859-018-2019-3 |
_version_ | 1783304014750810112 |
---|---|
author | Ko, GunHwan Kim, Pan-Gyu Yoon, Jongcheol Han, Gukhee Park, Seong-Jin Song, Wangho Lee, Byungwook |
author_facet | Ko, GunHwan Kim, Pan-Gyu Yoon, Jongcheol Han, Gukhee Park, Seong-Jin Song, Wangho Lee, Byungwook |
author_sort | Ko, GunHwan |
collection | PubMed |
description | BACKGROUND: While next-generation sequencing (NGS) costs have fallen in recent years, the cost and complexity of computation remain substantial obstacles to the use of NGS in bio-medical care and genomic research. The rapidly increasing amounts of data available from the new high-throughput methods have made data processing infeasible without automated pipelines. The integration of data and analytic resources into workflow systems provides a solution to the problem by simplifying the task of data analysis. RESULTS: To address this challenge, we developed a cloud-based workflow management system, Closha, to provide fast and cost-effective analysis of massive genomic data. We implemented complex workflows making optimal use of high-performance computing clusters. Closha allows users to create multi-step analyses using drag and drop functionality and to modify the parameters of pipeline tools. Users can also import the Galaxy pipelines into Closha. Closha is a hybrid system that enables users to use both analysis programs providing traditional tools and MapReduce-based big data analysis programs simultaneously in a single pipeline. Thus, the execution of analytics algorithms can be parallelized, speeding up the whole process. We also developed a high-speed data transmission solution, KoDS, to transmit a large amount of data at a fast rate. KoDS has a file transfer speed of up to 10 times that of normal FTP and HTTP. The computer hardware for Closha is 660 CPU cores and 800 TB of disk storage, enabling 500 jobs to run at the same time. CONCLUSIONS: Closha is a scalable, cost-effective, and publicly available web service for large-scale genomic data analysis. Closha supports the reliable and highly scalable execution of sequencing analysis workflows in a fully automated manner. Closha provides a user-friendly interface to all genomic scientists to try to derive accurate results from NGS platform data. The Closha cloud server is freely available for use from http://closha.kobic.re.kr/. |
format | Online Article Text |
id | pubmed-5836837 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-58368372018-03-07 Closha: bioinformatics workflow system for the analysis of massive sequencing data Ko, GunHwan Kim, Pan-Gyu Yoon, Jongcheol Han, Gukhee Park, Seong-Jin Song, Wangho Lee, Byungwook BMC Bioinformatics Methodology BACKGROUND: While next-generation sequencing (NGS) costs have fallen in recent years, the cost and complexity of computation remain substantial obstacles to the use of NGS in bio-medical care and genomic research. The rapidly increasing amounts of data available from the new high-throughput methods have made data processing infeasible without automated pipelines. The integration of data and analytic resources into workflow systems provides a solution to the problem by simplifying the task of data analysis. RESULTS: To address this challenge, we developed a cloud-based workflow management system, Closha, to provide fast and cost-effective analysis of massive genomic data. We implemented complex workflows making optimal use of high-performance computing clusters. Closha allows users to create multi-step analyses using drag and drop functionality and to modify the parameters of pipeline tools. Users can also import the Galaxy pipelines into Closha. Closha is a hybrid system that enables users to use both analysis programs providing traditional tools and MapReduce-based big data analysis programs simultaneously in a single pipeline. Thus, the execution of analytics algorithms can be parallelized, speeding up the whole process. We also developed a high-speed data transmission solution, KoDS, to transmit a large amount of data at a fast rate. KoDS has a file transfer speed of up to 10 times that of normal FTP and HTTP. The computer hardware for Closha is 660 CPU cores and 800 TB of disk storage, enabling 500 jobs to run at the same time. CONCLUSIONS: Closha is a scalable, cost-effective, and publicly available web service for large-scale genomic data analysis. Closha supports the reliable and highly scalable execution of sequencing analysis workflows in a fully automated manner. Closha provides a user-friendly interface to all genomic scientists to try to derive accurate results from NGS platform data. The Closha cloud server is freely available for use from http://closha.kobic.re.kr/. BioMed Central 2018-02-19 /pmc/articles/PMC5836837/ /pubmed/29504905 http://dx.doi.org/10.1186/s12859-018-2019-3 Text en © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Ko, GunHwan Kim, Pan-Gyu Yoon, Jongcheol Han, Gukhee Park, Seong-Jin Song, Wangho Lee, Byungwook Closha: bioinformatics workflow system for the analysis of massive sequencing data |
title | Closha: bioinformatics workflow system for the analysis of massive sequencing data |
title_full | Closha: bioinformatics workflow system for the analysis of massive sequencing data |
title_fullStr | Closha: bioinformatics workflow system for the analysis of massive sequencing data |
title_full_unstemmed | Closha: bioinformatics workflow system for the analysis of massive sequencing data |
title_short | Closha: bioinformatics workflow system for the analysis of massive sequencing data |
title_sort | closha: bioinformatics workflow system for the analysis of massive sequencing data |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5836837/ https://www.ncbi.nlm.nih.gov/pubmed/29504905 http://dx.doi.org/10.1186/s12859-018-2019-3 |
work_keys_str_mv | AT kogunhwan closhabioinformaticsworkflowsystemfortheanalysisofmassivesequencingdata AT kimpangyu closhabioinformaticsworkflowsystemfortheanalysisofmassivesequencingdata AT yoonjongcheol closhabioinformaticsworkflowsystemfortheanalysisofmassivesequencingdata AT hangukhee closhabioinformaticsworkflowsystemfortheanalysisofmassivesequencingdata AT parkseongjin closhabioinformaticsworkflowsystemfortheanalysisofmassivesequencingdata AT songwangho closhabioinformaticsworkflowsystemfortheanalysisofmassivesequencingdata AT leebyungwook closhabioinformaticsworkflowsystemfortheanalysisofmassivesequencingdata |