Cargando…

DistMap: A Toolkit for Distributed Short Read Mapping on a Hadoop Cluster

With the rapid and steady increase of next generation sequencing data output, the mapping of short reads has become a major data analysis bottleneck. On a single computer, it can take several days to map the vast quantity of reads produced from a single Illumina HiSeq lane. In an attempt to ameliora...

Descripción completa

Detalles Bibliográficos
Autores principales: Pandey, Ram Vinay, Schlötterer, Christian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3751911/
https://www.ncbi.nlm.nih.gov/pubmed/24009693
http://dx.doi.org/10.1371/journal.pone.0072614
_version_ 1782281700423761920
author Pandey, Ram Vinay
Schlötterer, Christian
author_facet Pandey, Ram Vinay
Schlötterer, Christian
author_sort Pandey, Ram Vinay
collection PubMed
description With the rapid and steady increase of next generation sequencing data output, the mapping of short reads has become a major data analysis bottleneck. On a single computer, it can take several days to map the vast quantity of reads produced from a single Illumina HiSeq lane. In an attempt to ameliorate this bottleneck we present a new tool, DistMap - a modular, scalable and integrated workflow to map reads in the Hadoop distributed computing framework. DistMap is easy to use, currently supports nine different short read mapping tools and can be run on all Unix-based operating systems. It accepts reads in FASTQ format as input and provides mapped reads in a SAM/BAM format. DistMap supports both paired-end and single-end reads thereby allowing the mapping of read data produced by different sequencing platforms. DistMap is available from http://code.google.com/p/distmap/
format Online
Article
Text
id pubmed-3751911
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-37519112013-09-05 DistMap: A Toolkit for Distributed Short Read Mapping on a Hadoop Cluster Pandey, Ram Vinay Schlötterer, Christian PLoS One Research Article With the rapid and steady increase of next generation sequencing data output, the mapping of short reads has become a major data analysis bottleneck. On a single computer, it can take several days to map the vast quantity of reads produced from a single Illumina HiSeq lane. In an attempt to ameliorate this bottleneck we present a new tool, DistMap - a modular, scalable and integrated workflow to map reads in the Hadoop distributed computing framework. DistMap is easy to use, currently supports nine different short read mapping tools and can be run on all Unix-based operating systems. It accepts reads in FASTQ format as input and provides mapped reads in a SAM/BAM format. DistMap supports both paired-end and single-end reads thereby allowing the mapping of read data produced by different sequencing platforms. DistMap is available from http://code.google.com/p/distmap/ Public Library of Science 2013-08-23 /pmc/articles/PMC3751911/ /pubmed/24009693 http://dx.doi.org/10.1371/journal.pone.0072614 Text en © 2013 Pandey et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Pandey, Ram Vinay
Schlötterer, Christian
DistMap: A Toolkit for Distributed Short Read Mapping on a Hadoop Cluster
title DistMap: A Toolkit for Distributed Short Read Mapping on a Hadoop Cluster
title_full DistMap: A Toolkit for Distributed Short Read Mapping on a Hadoop Cluster
title_fullStr DistMap: A Toolkit for Distributed Short Read Mapping on a Hadoop Cluster
title_full_unstemmed DistMap: A Toolkit for Distributed Short Read Mapping on a Hadoop Cluster
title_short DistMap: A Toolkit for Distributed Short Read Mapping on a Hadoop Cluster
title_sort distmap: a toolkit for distributed short read mapping on a hadoop cluster
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3751911/
https://www.ncbi.nlm.nih.gov/pubmed/24009693
http://dx.doi.org/10.1371/journal.pone.0072614
work_keys_str_mv AT pandeyramvinay distmapatoolkitfordistributedshortreadmappingonahadoopcluster
AT schlottererchristian distmapatoolkitfordistributedshortreadmappingonahadoopcluster