Cargando…

cl-dash: rapid configuration and deployment of Hadoop clusters for bioinformatics research in the cloud

Summary: One of the solutions proposed for addressing the challenge of the overwhelming abundance of genomic sequence and other biological data is the use of the Hadoop computing framework. Appropriate tools are needed to set up computational environments that facilitate research of novel bioinforma...

Descripción completa

Detalles Bibliográficos
Autores principales: Hodor, Paul, Chawla, Amandeep, Clark, Andrew, Neal, Lauren
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4708102/
https://www.ncbi.nlm.nih.gov/pubmed/26428290
http://dx.doi.org/10.1093/bioinformatics/btv553
_version_ 1782409399000629248
author Hodor, Paul
Chawla, Amandeep
Clark, Andrew
Neal, Lauren
author_facet Hodor, Paul
Chawla, Amandeep
Clark, Andrew
Neal, Lauren
author_sort Hodor, Paul
collection PubMed
description Summary: One of the solutions proposed for addressing the challenge of the overwhelming abundance of genomic sequence and other biological data is the use of the Hadoop computing framework. Appropriate tools are needed to set up computational environments that facilitate research of novel bioinformatics methodology using Hadoop. Here, we present cl-dash, a complete starter kit for setting up such an environment. Configuring and deploying new Hadoop clusters can be done in minutes. Use of Amazon Web Services ensures no initial investment and minimal operation costs. Two sample bioinformatics applications help the researcher understand and learn the principles of implementing an algorithm using the MapReduce programming pattern. Availability and implementation: Source code is available at https://bitbucket.org/booz-allen-sci-comp-team/cl-dash.git. Contact: hodor_paul@bah.com
format Online
Article
Text
id pubmed-4708102
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-47081022016-01-12 cl-dash: rapid configuration and deployment of Hadoop clusters for bioinformatics research in the cloud Hodor, Paul Chawla, Amandeep Clark, Andrew Neal, Lauren Bioinformatics Applications Notes Summary: One of the solutions proposed for addressing the challenge of the overwhelming abundance of genomic sequence and other biological data is the use of the Hadoop computing framework. Appropriate tools are needed to set up computational environments that facilitate research of novel bioinformatics methodology using Hadoop. Here, we present cl-dash, a complete starter kit for setting up such an environment. Configuring and deploying new Hadoop clusters can be done in minutes. Use of Amazon Web Services ensures no initial investment and minimal operation costs. Two sample bioinformatics applications help the researcher understand and learn the principles of implementing an algorithm using the MapReduce programming pattern. Availability and implementation: Source code is available at https://bitbucket.org/booz-allen-sci-comp-team/cl-dash.git. Contact: hodor_paul@bah.com Oxford University Press 2016-01-15 2015-10-01 /pmc/articles/PMC4708102/ /pubmed/26428290 http://dx.doi.org/10.1093/bioinformatics/btv553 Text en © The Author 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Notes
Hodor, Paul
Chawla, Amandeep
Clark, Andrew
Neal, Lauren
cl-dash: rapid configuration and deployment of Hadoop clusters for bioinformatics research in the cloud
title cl-dash: rapid configuration and deployment of Hadoop clusters for bioinformatics research in the cloud
title_full cl-dash: rapid configuration and deployment of Hadoop clusters for bioinformatics research in the cloud
title_fullStr cl-dash: rapid configuration and deployment of Hadoop clusters for bioinformatics research in the cloud
title_full_unstemmed cl-dash: rapid configuration and deployment of Hadoop clusters for bioinformatics research in the cloud
title_short cl-dash: rapid configuration and deployment of Hadoop clusters for bioinformatics research in the cloud
title_sort cl-dash: rapid configuration and deployment of hadoop clusters for bioinformatics research in the cloud
topic Applications Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4708102/
https://www.ncbi.nlm.nih.gov/pubmed/26428290
http://dx.doi.org/10.1093/bioinformatics/btv553
work_keys_str_mv AT hodorpaul cldashrapidconfigurationanddeploymentofhadoopclustersforbioinformaticsresearchinthecloud
AT chawlaamandeep cldashrapidconfigurationanddeploymentofhadoopclustersforbioinformaticsresearchinthecloud
AT clarkandrew cldashrapidconfigurationanddeploymentofhadoopclustersforbioinformaticsresearchinthecloud
AT neallauren cldashrapidconfigurationanddeploymentofhadoopclustersforbioinformaticsresearchinthecloud