Cargando…

Halvade-RNA: Parallel variant calling from transcriptomic data using MapReduce

Given the current cost-effectiveness of next-generation sequencing, the amount of DNA-seq and RNA-seq data generated is ever increasing. One of the primary objectives of NGS experiments is calling genetic variants. While highly accurate, most variant calling pipelines are not optimized to run effici...

Descripción completa

Detalles Bibliográficos
Autores principales:	Decap, Dries, Reumers, Joke, Herzeel, Charlotte, Costanza, Pascal, Fostier, Jan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2017
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5373595/ https://www.ncbi.nlm.nih.gov/pubmed/28358893 http://dx.doi.org/10.1371/journal.pone.0174575

_version_	1782518792424783872
author	Decap, Dries Reumers, Joke Herzeel, Charlotte Costanza, Pascal Fostier, Jan
author_facet	Decap, Dries Reumers, Joke Herzeel, Charlotte Costanza, Pascal Fostier, Jan
author_sort	Decap, Dries
collection	PubMed
description	Given the current cost-effectiveness of next-generation sequencing, the amount of DNA-seq and RNA-seq data generated is ever increasing. One of the primary objectives of NGS experiments is calling genetic variants. While highly accurate, most variant calling pipelines are not optimized to run efficiently on large data sets. However, as variant calling in genomic data has become common practice, several methods have been proposed to reduce runtime for DNA-seq analysis through the use of parallel computing. Determining the effectively expressed variants from transcriptomics (RNA-seq) data has only recently become possible, and as such does not yet benefit from efficiently parallelized workflows. We introduce Halvade-RNA, a parallel, multi-node RNA-seq variant calling pipeline based on the GATK Best Practices recommendations. Halvade-RNA makes use of the MapReduce programming model to create and manage parallel data streams on which multiple instances of existing tools such as STAR and GATK operate concurrently. Whereas the single-threaded processing of a typical RNA-seq sample requires ∼28h, Halvade-RNA reduces this runtime to ∼2h using a small cluster with two 20-core machines. Even on a single, multi-core workstation, Halvade-RNA can significantly reduce runtime compared to using multi-threading, thus providing for a more cost-effective processing of RNA-seq data. Halvade-RNA is written in Java and uses the Hadoop MapReduce 2.0 API. It supports a wide range of distributions of Hadoop, including Cloudera and Amazon EMR.
format	Online Article Text
id	pubmed-5373595
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-53735952017-04-07 Halvade-RNA: Parallel variant calling from transcriptomic data using MapReduce Decap, Dries Reumers, Joke Herzeel, Charlotte Costanza, Pascal Fostier, Jan PLoS One Research Article Given the current cost-effectiveness of next-generation sequencing, the amount of DNA-seq and RNA-seq data generated is ever increasing. One of the primary objectives of NGS experiments is calling genetic variants. While highly accurate, most variant calling pipelines are not optimized to run efficiently on large data sets. However, as variant calling in genomic data has become common practice, several methods have been proposed to reduce runtime for DNA-seq analysis through the use of parallel computing. Determining the effectively expressed variants from transcriptomics (RNA-seq) data has only recently become possible, and as such does not yet benefit from efficiently parallelized workflows. We introduce Halvade-RNA, a parallel, multi-node RNA-seq variant calling pipeline based on the GATK Best Practices recommendations. Halvade-RNA makes use of the MapReduce programming model to create and manage parallel data streams on which multiple instances of existing tools such as STAR and GATK operate concurrently. Whereas the single-threaded processing of a typical RNA-seq sample requires ∼28h, Halvade-RNA reduces this runtime to ∼2h using a small cluster with two 20-core machines. Even on a single, multi-core workstation, Halvade-RNA can significantly reduce runtime compared to using multi-threading, thus providing for a more cost-effective processing of RNA-seq data. Halvade-RNA is written in Java and uses the Hadoop MapReduce 2.0 API. It supports a wide range of distributions of Hadoop, including Cloudera and Amazon EMR. Public Library of Science 2017-03-30 /pmc/articles/PMC5373595/ /pubmed/28358893 http://dx.doi.org/10.1371/journal.pone.0174575 Text en © 2017 Decap et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Decap, Dries Reumers, Joke Herzeel, Charlotte Costanza, Pascal Fostier, Jan Halvade-RNA: Parallel variant calling from transcriptomic data using MapReduce
title	Halvade-RNA: Parallel variant calling from transcriptomic data using MapReduce
title_full	Halvade-RNA: Parallel variant calling from transcriptomic data using MapReduce
title_fullStr	Halvade-RNA: Parallel variant calling from transcriptomic data using MapReduce
title_full_unstemmed	Halvade-RNA: Parallel variant calling from transcriptomic data using MapReduce
title_short	Halvade-RNA: Parallel variant calling from transcriptomic data using MapReduce
title_sort	halvade-rna: parallel variant calling from transcriptomic data using mapreduce
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5373595/ https://www.ncbi.nlm.nih.gov/pubmed/28358893 http://dx.doi.org/10.1371/journal.pone.0174575
work_keys_str_mv	AT decapdries halvadernaparallelvariantcallingfromtranscriptomicdatausingmapreduce AT reumersjoke halvadernaparallelvariantcallingfromtranscriptomicdatausingmapreduce AT herzeelcharlotte halvadernaparallelvariantcallingfromtranscriptomicdatausingmapreduce AT costanzapascal halvadernaparallelvariantcallingfromtranscriptomicdatausingmapreduce AT fostierjan halvadernaparallelvariantcallingfromtranscriptomicdatausingmapreduce

Halvade-RNA: Parallel variant calling from transcriptomic data using MapReduce

Ejemplares similares