Cargando…

PVT: An Efficient Computational Procedure to Speed up Next-generation Sequence Analysis

BACKGROUND: High-throughput Next-Generation Sequencing (NGS) techniques are advancing genomics and molecular biology research. This technology generates substantially large data which puts up a major challenge to the scientists for an efficient, cost and time effective solution to analyse such data....

Descripción completa

Detalles Bibliográficos
Autores principales:	Maji, Ranjan Kumar, Sarkar, Arijita, Khatua, Sunirmal, Dasgupta, Subhasis, Ghosh, Zhumur
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4063226/ https://www.ncbi.nlm.nih.gov/pubmed/24894600 http://dx.doi.org/10.1186/1471-2105-15-167

_version_	1782321769686761472
author	Maji, Ranjan Kumar Sarkar, Arijita Khatua, Sunirmal Dasgupta, Subhasis Ghosh, Zhumur
author_facet	Maji, Ranjan Kumar Sarkar, Arijita Khatua, Sunirmal Dasgupta, Subhasis Ghosh, Zhumur
author_sort	Maji, Ranjan Kumar
collection	PubMed
description	BACKGROUND: High-throughput Next-Generation Sequencing (NGS) techniques are advancing genomics and molecular biology research. This technology generates substantially large data which puts up a major challenge to the scientists for an efficient, cost and time effective solution to analyse such data. Further, for the different types of NGS data, there are certain common challenging steps involved in analysing those data. Spliced alignment is one such fundamental step in NGS data analysis which is extremely computational intensive as well as time consuming. There exists serious problem even with the most widely used spliced alignment tools. TopHat is one such widely used spliced alignment tools which although supports multithreading, does not efficiently utilize computational resources in terms of CPU utilization and memory. Here we have introduced PVT (Pipelined Version of TopHat) where we take up a modular approach by breaking TopHat’s serial execution into a pipeline of multiple stages, thereby increasing the degree of parallelization and computational resource utilization. Thus we address the discrepancies in TopHat so as to analyze large NGS data efficiently. RESULTS: We analysed the SRA dataset (SRX026839 and SRX026838) consisting of single end reads and SRA data SRR1027730 consisting of paired-end reads. We used TopHat v2.0.8 to analyse these datasets and noted the CPU usage, memory footprint and execution time during spliced alignment. With this basic information, we designed PVT, a pipelined version of TopHat that removes the redundant computational steps during ‘spliced alignment’ and breaks the job into a pipeline of multiple stages (each comprising of different step(s)) to improve its resource utilization, thus reducing the execution time. CONCLUSIONS: PVT provides an improvement over TopHat for spliced alignment of NGS data analysis. PVT thus resulted in the reduction of the execution time to ~23% for the single end read dataset. Further, PVT designed for paired end reads showed an improved performance of ~41% over TopHat (for the chosen data) with respect to execution time. Moreover we propose PVT-Cloud which implements PVT pipeline in cloud computing system.
format	Online Article Text
id	pubmed-4063226
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-40632262014-06-30 PVT: An Efficient Computational Procedure to Speed up Next-generation Sequence Analysis Maji, Ranjan Kumar Sarkar, Arijita Khatua, Sunirmal Dasgupta, Subhasis Ghosh, Zhumur BMC Bioinformatics Methodology Article BACKGROUND: High-throughput Next-Generation Sequencing (NGS) techniques are advancing genomics and molecular biology research. This technology generates substantially large data which puts up a major challenge to the scientists for an efficient, cost and time effective solution to analyse such data. Further, for the different types of NGS data, there are certain common challenging steps involved in analysing those data. Spliced alignment is one such fundamental step in NGS data analysis which is extremely computational intensive as well as time consuming. There exists serious problem even with the most widely used spliced alignment tools. TopHat is one such widely used spliced alignment tools which although supports multithreading, does not efficiently utilize computational resources in terms of CPU utilization and memory. Here we have introduced PVT (Pipelined Version of TopHat) where we take up a modular approach by breaking TopHat’s serial execution into a pipeline of multiple stages, thereby increasing the degree of parallelization and computational resource utilization. Thus we address the discrepancies in TopHat so as to analyze large NGS data efficiently. RESULTS: We analysed the SRA dataset (SRX026839 and SRX026838) consisting of single end reads and SRA data SRR1027730 consisting of paired-end reads. We used TopHat v2.0.8 to analyse these datasets and noted the CPU usage, memory footprint and execution time during spliced alignment. With this basic information, we designed PVT, a pipelined version of TopHat that removes the redundant computational steps during ‘spliced alignment’ and breaks the job into a pipeline of multiple stages (each comprising of different step(s)) to improve its resource utilization, thus reducing the execution time. CONCLUSIONS: PVT provides an improvement over TopHat for spliced alignment of NGS data analysis. PVT thus resulted in the reduction of the execution time to ~23% for the single end read dataset. Further, PVT designed for paired end reads showed an improved performance of ~41% over TopHat (for the chosen data) with respect to execution time. Moreover we propose PVT-Cloud which implements PVT pipeline in cloud computing system. BioMed Central 2014-06-04 /pmc/articles/PMC4063226/ /pubmed/24894600 http://dx.doi.org/10.1186/1471-2105-15-167 Text en Copyright © 2014 Maji et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article Maji, Ranjan Kumar Sarkar, Arijita Khatua, Sunirmal Dasgupta, Subhasis Ghosh, Zhumur PVT: An Efficient Computational Procedure to Speed up Next-generation Sequence Analysis
title	PVT: An Efficient Computational Procedure to Speed up Next-generation Sequence Analysis
title_full	PVT: An Efficient Computational Procedure to Speed up Next-generation Sequence Analysis
title_fullStr	PVT: An Efficient Computational Procedure to Speed up Next-generation Sequence Analysis
title_full_unstemmed	PVT: An Efficient Computational Procedure to Speed up Next-generation Sequence Analysis
title_short	PVT: An Efficient Computational Procedure to Speed up Next-generation Sequence Analysis
title_sort	pvt: an efficient computational procedure to speed up next-generation sequence analysis
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4063226/ https://www.ncbi.nlm.nih.gov/pubmed/24894600 http://dx.doi.org/10.1186/1471-2105-15-167
work_keys_str_mv	AT majiranjankumar pvtanefficientcomputationalproceduretospeedupnextgenerationsequenceanalysis AT sarkararijita pvtanefficientcomputationalproceduretospeedupnextgenerationsequenceanalysis AT khatuasunirmal pvtanefficientcomputationalproceduretospeedupnextgenerationsequenceanalysis AT dasguptasubhasis pvtanefficientcomputationalproceduretospeedupnextgenerationsequenceanalysis AT ghoshzhumur pvtanefficientcomputationalproceduretospeedupnextgenerationsequenceanalysis

PVT: An Efficient Computational Procedure to Speed up Next-generation Sequence Analysis

Ejemplares similares