Cargando…

Hadoop-BAM: directly manipulating next generation sequencing data in the cloud

Summary: Hadoop-BAM is a novel library for the scalable manipulation of aligned next-generation sequencing data in the Hadoop distributed computing framework. It acts as an integration layer between analysis applications and BAM files that are processed using Hadoop. Hadoop-BAM solves the issues rel...

Descripción completa

Detalles Bibliográficos
Autores principales:	Niemenmaa, Matti, Kallio, Aleksi, Schumacher, André, Klemelä, Petri, Korpelainen, Eija, Heljanko, Keijo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2012
Materias:	Applications Note
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3307120/ https://www.ncbi.nlm.nih.gov/pubmed/22302568 http://dx.doi.org/10.1093/bioinformatics/bts054

_version_	1782227291746598912
author	Niemenmaa, Matti Kallio, Aleksi Schumacher, André Klemelä, Petri Korpelainen, Eija Heljanko, Keijo
author_facet	Niemenmaa, Matti Kallio, Aleksi Schumacher, André Klemelä, Petri Korpelainen, Eija Heljanko, Keijo
author_sort	Niemenmaa, Matti
collection	PubMed
description	Summary: Hadoop-BAM is a novel library for the scalable manipulation of aligned next-generation sequencing data in the Hadoop distributed computing framework. It acts as an integration layer between analysis applications and BAM files that are processed using Hadoop. Hadoop-BAM solves the issues related to BAM data access by presenting a convenient API for implementing map and reduce functions that can directly operate on BAM records. It builds on top of the Picard SAM JDK, so tools that rely on the Picard API are expected to be easily convertible to support large-scale distributed processing. In this article we demonstrate the use of Hadoop-BAM by building a coverage summarizing tool for the Chipster genome browser. Our results show that Hadoop offers good scalability, and one should avoid moving data in and out of Hadoop between analysis steps. Availability: Available under the open-source MIT license at http://sourceforge.net/projects/hadoop-bam/ Contact: matti.niemenmaa@aalto.fi Supplementary information: Supplementary material is available at Bioinformatics online.
format	Online Article Text
id	pubmed-3307120
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-33071202012-03-19 Hadoop-BAM: directly manipulating next generation sequencing data in the cloud Niemenmaa, Matti Kallio, Aleksi Schumacher, André Klemelä, Petri Korpelainen, Eija Heljanko, Keijo Bioinformatics Applications Note Summary: Hadoop-BAM is a novel library for the scalable manipulation of aligned next-generation sequencing data in the Hadoop distributed computing framework. It acts as an integration layer between analysis applications and BAM files that are processed using Hadoop. Hadoop-BAM solves the issues related to BAM data access by presenting a convenient API for implementing map and reduce functions that can directly operate on BAM records. It builds on top of the Picard SAM JDK, so tools that rely on the Picard API are expected to be easily convertible to support large-scale distributed processing. In this article we demonstrate the use of Hadoop-BAM by building a coverage summarizing tool for the Chipster genome browser. Our results show that Hadoop offers good scalability, and one should avoid moving data in and out of Hadoop between analysis steps. Availability: Available under the open-source MIT license at http://sourceforge.net/projects/hadoop-bam/ Contact: matti.niemenmaa@aalto.fi Supplementary information: Supplementary material is available at Bioinformatics online. Oxford University Press 2012-03-15 2012-02-02 /pmc/articles/PMC3307120/ /pubmed/22302568 http://dx.doi.org/10.1093/bioinformatics/bts054 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Applications Note Niemenmaa, Matti Kallio, Aleksi Schumacher, André Klemelä, Petri Korpelainen, Eija Heljanko, Keijo Hadoop-BAM: directly manipulating next generation sequencing data in the cloud
title	Hadoop-BAM: directly manipulating next generation sequencing data in the cloud
title_full	Hadoop-BAM: directly manipulating next generation sequencing data in the cloud
title_fullStr	Hadoop-BAM: directly manipulating next generation sequencing data in the cloud
title_full_unstemmed	Hadoop-BAM: directly manipulating next generation sequencing data in the cloud
title_short	Hadoop-BAM: directly manipulating next generation sequencing data in the cloud
title_sort	hadoop-bam: directly manipulating next generation sequencing data in the cloud
topic	Applications Note
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3307120/ https://www.ncbi.nlm.nih.gov/pubmed/22302568 http://dx.doi.org/10.1093/bioinformatics/bts054
work_keys_str_mv	AT niemenmaamatti hadoopbamdirectlymanipulatingnextgenerationsequencingdatainthecloud AT kallioaleksi hadoopbamdirectlymanipulatingnextgenerationsequencingdatainthecloud AT schumacherandre hadoopbamdirectlymanipulatingnextgenerationsequencingdatainthecloud AT klemelapetri hadoopbamdirectlymanipulatingnextgenerationsequencingdatainthecloud AT korpelaineneija hadoopbamdirectlymanipulatingnextgenerationsequencingdatainthecloud AT heljankokeijo hadoopbamdirectlymanipulatingnextgenerationsequencingdatainthecloud

Hadoop-BAM: directly manipulating next generation sequencing data in the cloud

Ejemplares similares