Cargando…

HSRA: Hadoop-based spliced read aligner for RNA sequencing data

Nowadays, the analysis of transcriptome sequencing (RNA-seq) data has become the standard method for quantifying the levels of gene expression. In RNA-seq experiments, the mapping of short reads to a reference genome or transcriptome is considered a crucial step that remains as one of the most time-...

Descripción completa

Detalles Bibliográficos
Autores principales:	Expósito, Roberto R., González-Domínguez, Jorge, Touriño, Juan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2018
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6067734/ https://www.ncbi.nlm.nih.gov/pubmed/30063721 http://dx.doi.org/10.1371/journal.pone.0201483

_version_	1783343160393465856
author	Expósito, Roberto R. González-Domínguez, Jorge Touriño, Juan
author_facet	Expósito, Roberto R. González-Domínguez, Jorge Touriño, Juan
author_sort	Expósito, Roberto R.
collection	PubMed
description	Nowadays, the analysis of transcriptome sequencing (RNA-seq) data has become the standard method for quantifying the levels of gene expression. In RNA-seq experiments, the mapping of short reads to a reference genome or transcriptome is considered a crucial step that remains as one of the most time-consuming. With the steady development of Next Generation Sequencing (NGS) technologies, unprecedented amounts of genomic data introduce significant challenges in terms of storage, processing and downstream analysis. As cost and throughput continue to improve, there is a growing need for new software solutions that minimize the impact of increasing data volume on RNA read alignment. In this work we introduce HSRA, a Big Data tool that takes advantage of the MapReduce programming model to extend the multithreading capabilities of a state-of-the-art spliced read aligner for RNA-seq data (HISAT2) to distributed memory systems such as multi-core clusters or cloud platforms. HSRA has been built upon the Hadoop MapReduce framework and supports both single- and paired-end reads from FASTQ/FASTA datasets, providing output alignments in SAM format. The design of HSRA has been carefully optimized to avoid the main limitations and major causes of inefficiency found in previous Big Data mapping tools, which cannot fully exploit the raw performance of the underlying aligner. On a 16-node multi-core cluster, HSRA is on average 2.3 times faster than previous Hadoop-based tools. Source code in Java as well as a user’s guide are publicly available for download at http://hsra.dec.udc.es.
format	Online Article Text
id	pubmed-6067734
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-60677342018-08-10 HSRA: Hadoop-based spliced read aligner for RNA sequencing data Expósito, Roberto R. González-Domínguez, Jorge Touriño, Juan PLoS One Research Article Nowadays, the analysis of transcriptome sequencing (RNA-seq) data has become the standard method for quantifying the levels of gene expression. In RNA-seq experiments, the mapping of short reads to a reference genome or transcriptome is considered a crucial step that remains as one of the most time-consuming. With the steady development of Next Generation Sequencing (NGS) technologies, unprecedented amounts of genomic data introduce significant challenges in terms of storage, processing and downstream analysis. As cost and throughput continue to improve, there is a growing need for new software solutions that minimize the impact of increasing data volume on RNA read alignment. In this work we introduce HSRA, a Big Data tool that takes advantage of the MapReduce programming model to extend the multithreading capabilities of a state-of-the-art spliced read aligner for RNA-seq data (HISAT2) to distributed memory systems such as multi-core clusters or cloud platforms. HSRA has been built upon the Hadoop MapReduce framework and supports both single- and paired-end reads from FASTQ/FASTA datasets, providing output alignments in SAM format. The design of HSRA has been carefully optimized to avoid the main limitations and major causes of inefficiency found in previous Big Data mapping tools, which cannot fully exploit the raw performance of the underlying aligner. On a 16-node multi-core cluster, HSRA is on average 2.3 times faster than previous Hadoop-based tools. Source code in Java as well as a user’s guide are publicly available for download at http://hsra.dec.udc.es. Public Library of Science 2018-07-31 /pmc/articles/PMC6067734/ /pubmed/30063721 http://dx.doi.org/10.1371/journal.pone.0201483 Text en © 2018 Expósito et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Expósito, Roberto R. González-Domínguez, Jorge Touriño, Juan HSRA: Hadoop-based spliced read aligner for RNA sequencing data
title	HSRA: Hadoop-based spliced read aligner for RNA sequencing data
title_full	HSRA: Hadoop-based spliced read aligner for RNA sequencing data
title_fullStr	HSRA: Hadoop-based spliced read aligner for RNA sequencing data
title_full_unstemmed	HSRA: Hadoop-based spliced read aligner for RNA sequencing data
title_short	HSRA: Hadoop-based spliced read aligner for RNA sequencing data
title_sort	hsra: hadoop-based spliced read aligner for rna sequencing data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6067734/ https://www.ncbi.nlm.nih.gov/pubmed/30063721 http://dx.doi.org/10.1371/journal.pone.0201483
work_keys_str_mv	AT expositorobertor hsrahadoopbasedsplicedreadalignerforrnasequencingdata AT gonzalezdominguezjorge hsrahadoopbasedsplicedreadalignerforrnasequencingdata AT tourinojuan hsrahadoopbasedsplicedreadalignerforrnasequencingdata

HSRA: Hadoop-based spliced read aligner for RNA sequencing data

Ejemplares similares