Cargando…

Closha: bioinformatics workflow system for the analysis of massive sequencing data

BACKGROUND: While next-generation sequencing (NGS) costs have fallen in recent years, the cost and complexity of computation remain substantial obstacles to the use of NGS in bio-medical care and genomic research. The rapidly increasing amounts of data available from the new high-throughput methods...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ko, GunHwan, Kim, Pan-Gyu, Yoon, Jongcheol, Han, Gukhee, Park, Seong-Jin, Song, Wangho, Lee, Byungwook
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2018
Materias:	Methodology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5836837/ https://www.ncbi.nlm.nih.gov/pubmed/29504905 http://dx.doi.org/10.1186/s12859-018-2019-3

_version_	1783304014750810112
author	Ko, GunHwan Kim, Pan-Gyu Yoon, Jongcheol Han, Gukhee Park, Seong-Jin Song, Wangho Lee, Byungwook
author_facet	Ko, GunHwan Kim, Pan-Gyu Yoon, Jongcheol Han, Gukhee Park, Seong-Jin Song, Wangho Lee, Byungwook
author_sort	Ko, GunHwan
collection	PubMed
description	BACKGROUND: While next-generation sequencing (NGS) costs have fallen in recent years, the cost and complexity of computation remain substantial obstacles to the use of NGS in bio-medical care and genomic research. The rapidly increasing amounts of data available from the new high-throughput methods have made data processing infeasible without automated pipelines. The integration of data and analytic resources into workflow systems provides a solution to the problem by simplifying the task of data analysis. RESULTS: To address this challenge, we developed a cloud-based workflow management system, Closha, to provide fast and cost-effective analysis of massive genomic data. We implemented complex workflows making optimal use of high-performance computing clusters. Closha allows users to create multi-step analyses using drag and drop functionality and to modify the parameters of pipeline tools. Users can also import the Galaxy pipelines into Closha. Closha is a hybrid system that enables users to use both analysis programs providing traditional tools and MapReduce-based big data analysis programs simultaneously in a single pipeline. Thus, the execution of analytics algorithms can be parallelized, speeding up the whole process. We also developed a high-speed data transmission solution, KoDS, to transmit a large amount of data at a fast rate. KoDS has a file transfer speed of up to 10 times that of normal FTP and HTTP. The computer hardware for Closha is 660 CPU cores and 800 TB of disk storage, enabling 500 jobs to run at the same time. CONCLUSIONS: Closha is a scalable, cost-effective, and publicly available web service for large-scale genomic data analysis. Closha supports the reliable and highly scalable execution of sequencing analysis workflows in a fully automated manner. Closha provides a user-friendly interface to all genomic scientists to try to derive accurate results from NGS platform data. The Closha cloud server is freely available for use from http://closha.kobic.re.kr/.
format	Online Article Text
id	pubmed-5836837
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-58368372018-03-07 Closha: bioinformatics workflow system for the analysis of massive sequencing data Ko, GunHwan Kim, Pan-Gyu Yoon, Jongcheol Han, Gukhee Park, Seong-Jin Song, Wangho Lee, Byungwook BMC Bioinformatics Methodology BACKGROUND: While next-generation sequencing (NGS) costs have fallen in recent years, the cost and complexity of computation remain substantial obstacles to the use of NGS in bio-medical care and genomic research. The rapidly increasing amounts of data available from the new high-throughput methods have made data processing infeasible without automated pipelines. The integration of data and analytic resources into workflow systems provides a solution to the problem by simplifying the task of data analysis. RESULTS: To address this challenge, we developed a cloud-based workflow management system, Closha, to provide fast and cost-effective analysis of massive genomic data. We implemented complex workflows making optimal use of high-performance computing clusters. Closha allows users to create multi-step analyses using drag and drop functionality and to modify the parameters of pipeline tools. Users can also import the Galaxy pipelines into Closha. Closha is a hybrid system that enables users to use both analysis programs providing traditional tools and MapReduce-based big data analysis programs simultaneously in a single pipeline. Thus, the execution of analytics algorithms can be parallelized, speeding up the whole process. We also developed a high-speed data transmission solution, KoDS, to transmit a large amount of data at a fast rate. KoDS has a file transfer speed of up to 10 times that of normal FTP and HTTP. The computer hardware for Closha is 660 CPU cores and 800 TB of disk storage, enabling 500 jobs to run at the same time. CONCLUSIONS: Closha is a scalable, cost-effective, and publicly available web service for large-scale genomic data analysis. Closha supports the reliable and highly scalable execution of sequencing analysis workflows in a fully automated manner. Closha provides a user-friendly interface to all genomic scientists to try to derive accurate results from NGS platform data. The Closha cloud server is freely available for use from http://closha.kobic.re.kr/. BioMed Central 2018-02-19 /pmc/articles/PMC5836837/ /pubmed/29504905 http://dx.doi.org/10.1186/s12859-018-2019-3 Text en © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Ko, GunHwan Kim, Pan-Gyu Yoon, Jongcheol Han, Gukhee Park, Seong-Jin Song, Wangho Lee, Byungwook Closha: bioinformatics workflow system for the analysis of massive sequencing data
title	Closha: bioinformatics workflow system for the analysis of massive sequencing data
title_full	Closha: bioinformatics workflow system for the analysis of massive sequencing data
title_fullStr	Closha: bioinformatics workflow system for the analysis of massive sequencing data
title_full_unstemmed	Closha: bioinformatics workflow system for the analysis of massive sequencing data
title_short	Closha: bioinformatics workflow system for the analysis of massive sequencing data
title_sort	closha: bioinformatics workflow system for the analysis of massive sequencing data
topic	Methodology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5836837/ https://www.ncbi.nlm.nih.gov/pubmed/29504905 http://dx.doi.org/10.1186/s12859-018-2019-3
work_keys_str_mv	AT kogunhwan closhabioinformaticsworkflowsystemfortheanalysisofmassivesequencingdata AT kimpangyu closhabioinformaticsworkflowsystemfortheanalysisofmassivesequencingdata AT yoonjongcheol closhabioinformaticsworkflowsystemfortheanalysisofmassivesequencingdata AT hangukhee closhabioinformaticsworkflowsystemfortheanalysisofmassivesequencingdata AT parkseongjin closhabioinformaticsworkflowsystemfortheanalysisofmassivesequencingdata AT songwangho closhabioinformaticsworkflowsystemfortheanalysisofmassivesequencingdata AT leebyungwook closhabioinformaticsworkflowsystemfortheanalysisofmassivesequencingdata

Closha: bioinformatics workflow system for the analysis of massive sequencing data

Ejemplares similares