Cargando…

CloudDOE: A User-Friendly Tool for Deploying Hadoop Clouds and Analyzing High-Throughput Sequencing Data with MapReduce

BACKGROUND: Explosive growth of next-generation sequencing data has resulted in ultra-large-scale data sets and ensuing computational problems. Cloud computing provides an on-demand and scalable environment for large-scale data analysis. Using a MapReduce framework, data and workload can be distribu...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chung, Wei-Chun, Chen, Chien-Chih, Ho, Jan-Ming, Lin, Chung-Yen, Hsu, Wen-Lian, Wang, Yu-Chun, Lee, D. T., Lai, Feipei, Huang, Chih-Wei, Chang, Yu-Jung
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2014
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4045712/ https://www.ncbi.nlm.nih.gov/pubmed/24897343 http://dx.doi.org/10.1371/journal.pone.0098146

_version_	1782319363461742592
author	Chung, Wei-Chun Chen, Chien-Chih Ho, Jan-Ming Lin, Chung-Yen Hsu, Wen-Lian Wang, Yu-Chun Lee, D. T. Lai, Feipei Huang, Chih-Wei Chang, Yu-Jung
author_facet	Chung, Wei-Chun Chen, Chien-Chih Ho, Jan-Ming Lin, Chung-Yen Hsu, Wen-Lian Wang, Yu-Chun Lee, D. T. Lai, Feipei Huang, Chih-Wei Chang, Yu-Jung
author_sort	Chung, Wei-Chun
collection	PubMed
description	BACKGROUND: Explosive growth of next-generation sequencing data has resulted in ultra-large-scale data sets and ensuing computational problems. Cloud computing provides an on-demand and scalable environment for large-scale data analysis. Using a MapReduce framework, data and workload can be distributed via a network to computers in the cloud to substantially reduce computational latency. Hadoop/MapReduce has been successfully adopted in bioinformatics for genome assembly, mapping reads to genomes, and finding single nucleotide polymorphisms. Major cloud providers offer Hadoop cloud services to their users. However, it remains technically challenging to deploy a Hadoop cloud for those who prefer to run MapReduce programs in a cluster without built-in Hadoop/MapReduce. RESULTS: We present CloudDOE, a platform-independent software package implemented in Java. CloudDOE encapsulates technical details behind a user-friendly graphical interface, thus liberating scientists from having to perform complicated operational procedures. Users are guided through the user interface to deploy a Hadoop cloud within in-house computing environments and to run applications specifically targeted for bioinformatics, including CloudBurst, CloudBrush, and CloudRS. One may also use CloudDOE on top of a public cloud. CloudDOE consists of three wizards, i.e., Deploy, Operate, and Extend wizards. Deploy wizard is designed to aid the system administrator to deploy a Hadoop cloud. It installs Java runtime environment version 1.6 and Hadoop version 0.20.203, and initiates the service automatically. Operate wizard allows the user to run a MapReduce application on the dashboard list. To extend the dashboard list, the administrator may install a new MapReduce application using Extend wizard. CONCLUSIONS: CloudDOE is a user-friendly tool for deploying a Hadoop cloud. Its smart wizards substantially reduce the complexity and costs of deployment, execution, enhancement, and management. Interested users may collaborate to improve the source code of CloudDOE to further incorporate more MapReduce bioinformatics tools into CloudDOE and support next-generation big data open source tools, e.g., Hadoop BigTop and Spark. Availability: CloudDOE is distributed under Apache License 2.0 and is freely available at http://clouddoe.iis.sinica.edu.tw/.
format	Online Article Text
id	pubmed-4045712
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-40457122014-06-09 CloudDOE: A User-Friendly Tool for Deploying Hadoop Clouds and Analyzing High-Throughput Sequencing Data with MapReduce Chung, Wei-Chun Chen, Chien-Chih Ho, Jan-Ming Lin, Chung-Yen Hsu, Wen-Lian Wang, Yu-Chun Lee, D. T. Lai, Feipei Huang, Chih-Wei Chang, Yu-Jung PLoS One Research Article BACKGROUND: Explosive growth of next-generation sequencing data has resulted in ultra-large-scale data sets and ensuing computational problems. Cloud computing provides an on-demand and scalable environment for large-scale data analysis. Using a MapReduce framework, data and workload can be distributed via a network to computers in the cloud to substantially reduce computational latency. Hadoop/MapReduce has been successfully adopted in bioinformatics for genome assembly, mapping reads to genomes, and finding single nucleotide polymorphisms. Major cloud providers offer Hadoop cloud services to their users. However, it remains technically challenging to deploy a Hadoop cloud for those who prefer to run MapReduce programs in a cluster without built-in Hadoop/MapReduce. RESULTS: We present CloudDOE, a platform-independent software package implemented in Java. CloudDOE encapsulates technical details behind a user-friendly graphical interface, thus liberating scientists from having to perform complicated operational procedures. Users are guided through the user interface to deploy a Hadoop cloud within in-house computing environments and to run applications specifically targeted for bioinformatics, including CloudBurst, CloudBrush, and CloudRS. One may also use CloudDOE on top of a public cloud. CloudDOE consists of three wizards, i.e., Deploy, Operate, and Extend wizards. Deploy wizard is designed to aid the system administrator to deploy a Hadoop cloud. It installs Java runtime environment version 1.6 and Hadoop version 0.20.203, and initiates the service automatically. Operate wizard allows the user to run a MapReduce application on the dashboard list. To extend the dashboard list, the administrator may install a new MapReduce application using Extend wizard. CONCLUSIONS: CloudDOE is a user-friendly tool for deploying a Hadoop cloud. Its smart wizards substantially reduce the complexity and costs of deployment, execution, enhancement, and management. Interested users may collaborate to improve the source code of CloudDOE to further incorporate more MapReduce bioinformatics tools into CloudDOE and support next-generation big data open source tools, e.g., Hadoop BigTop and Spark. Availability: CloudDOE is distributed under Apache License 2.0 and is freely available at http://clouddoe.iis.sinica.edu.tw/. Public Library of Science 2014-06-04 /pmc/articles/PMC4045712/ /pubmed/24897343 http://dx.doi.org/10.1371/journal.pone.0098146 Text en © 2014 Chung et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Chung, Wei-Chun Chen, Chien-Chih Ho, Jan-Ming Lin, Chung-Yen Hsu, Wen-Lian Wang, Yu-Chun Lee, D. T. Lai, Feipei Huang, Chih-Wei Chang, Yu-Jung CloudDOE: A User-Friendly Tool for Deploying Hadoop Clouds and Analyzing High-Throughput Sequencing Data with MapReduce
title	CloudDOE: A User-Friendly Tool for Deploying Hadoop Clouds and Analyzing High-Throughput Sequencing Data with MapReduce
title_full	CloudDOE: A User-Friendly Tool for Deploying Hadoop Clouds and Analyzing High-Throughput Sequencing Data with MapReduce
title_fullStr	CloudDOE: A User-Friendly Tool for Deploying Hadoop Clouds and Analyzing High-Throughput Sequencing Data with MapReduce
title_full_unstemmed	CloudDOE: A User-Friendly Tool for Deploying Hadoop Clouds and Analyzing High-Throughput Sequencing Data with MapReduce
title_short	CloudDOE: A User-Friendly Tool for Deploying Hadoop Clouds and Analyzing High-Throughput Sequencing Data with MapReduce
title_sort	clouddoe: a user-friendly tool for deploying hadoop clouds and analyzing high-throughput sequencing data with mapreduce
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4045712/ https://www.ncbi.nlm.nih.gov/pubmed/24897343 http://dx.doi.org/10.1371/journal.pone.0098146
work_keys_str_mv	AT chungweichun clouddoeauserfriendlytoolfordeployinghadoopcloudsandanalyzinghighthroughputsequencingdatawithmapreduce AT chenchienchih clouddoeauserfriendlytoolfordeployinghadoopcloudsandanalyzinghighthroughputsequencingdatawithmapreduce AT hojanming clouddoeauserfriendlytoolfordeployinghadoopcloudsandanalyzinghighthroughputsequencingdatawithmapreduce AT linchungyen clouddoeauserfriendlytoolfordeployinghadoopcloudsandanalyzinghighthroughputsequencingdatawithmapreduce AT hsuwenlian clouddoeauserfriendlytoolfordeployinghadoopcloudsandanalyzinghighthroughputsequencingdatawithmapreduce AT wangyuchun clouddoeauserfriendlytoolfordeployinghadoopcloudsandanalyzinghighthroughputsequencingdatawithmapreduce AT leedt clouddoeauserfriendlytoolfordeployinghadoopcloudsandanalyzinghighthroughputsequencingdatawithmapreduce AT laifeipei clouddoeauserfriendlytoolfordeployinghadoopcloudsandanalyzinghighthroughputsequencingdatawithmapreduce AT huangchihwei clouddoeauserfriendlytoolfordeployinghadoopcloudsandanalyzinghighthroughputsequencingdatawithmapreduce AT changyujung clouddoeauserfriendlytoolfordeployinghadoopcloudsandanalyzinghighthroughputsequencingdatawithmapreduce

CloudDOE: A User-Friendly Tool for Deploying Hadoop Clouds and Analyzing High-Throughput Sequencing Data with MapReduce

Ejemplares similares