Cargando…

FQStat: a parallel architecture for very high-speed assessment of sequencing quality metrics

BACKGROUND: High throughput DNA/RNA sequencing has revolutionized biological and clinical research. Sequencing is widely used, and generates very large amounts of data, mainly due to reduced cost and advanced technologies. Quickly assessing the quality of giga-to-tera base levels of sequencing data...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chanumolu, Sree K., Albahrani, Mustafa, Otu, Hasan H.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6694608/ https://www.ncbi.nlm.nih.gov/pubmed/31416440 http://dx.doi.org/10.1186/s12859-019-3015-y

_version_	1783443860707344384
author	Chanumolu, Sree K. Albahrani, Mustafa Otu, Hasan H.
author_facet	Chanumolu, Sree K. Albahrani, Mustafa Otu, Hasan H.
author_sort	Chanumolu, Sree K.
collection	PubMed
description	BACKGROUND: High throughput DNA/RNA sequencing has revolutionized biological and clinical research. Sequencing is widely used, and generates very large amounts of data, mainly due to reduced cost and advanced technologies. Quickly assessing the quality of giga-to-tera base levels of sequencing data has become a routine but important task. Identification and elimination of low-quality sequence data is crucial for reliability of downstream analysis results. There is a need for a high-speed tool that uses optimized parallel programming for batch processing and simply gauges the quality of sequencing data from multiple datasets independent of any other processing steps. RESULTS: FQStat is a stand-alone, platform-independent software tool that assesses the quality of FASTQ files using parallel programming. Based on the machine architecture and input data, FQStat automatically determines the number of cores and the amount of memory to be allocated per file for optimum performance. Our results indicate that in a core-limited case, core assignment overhead exceeds the benefit of additional cores. In a core-unlimited case, there is a saturation point reached in performance by increasingly assigning additional cores per file. We also show that memory allocation per file has a lower priority in performance when compared to the allocation of cores. FQStat’s output is summarized in HTML web page, tab-delimited text file, and high-resolution image formats. FQStat calculates and plots read count, read length, quality score, and high-quality base statistics. FQStat identifies and marks low-quality sequencing data to suggest removal from downstream analysis. We applied FQStat on real sequencing data to optimize performance and to demonstrate its capabilities. We also compared FQStat’s performance to similar quality control (QC) tools that utilize parallel programming and attained improvements in run time. CONCLUSIONS: FQStat is a user-friendly tool with a graphical interface that employs a parallel programming architecture and automatically optimizes its performance to generate quality control statistics for sequencing data. Unlike existing tools, these statistics are calculated for multiple datasets and separately at the “lane,” “sample,” and “experiment” level to identify subsets of the samples with low quality, thereby preventing the loss of complete samples when reliable data can still be obtained. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-3015-y) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-6694608
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-66946082019-08-19 FQStat: a parallel architecture for very high-speed assessment of sequencing quality metrics Chanumolu, Sree K. Albahrani, Mustafa Otu, Hasan H. BMC Bioinformatics Software BACKGROUND: High throughput DNA/RNA sequencing has revolutionized biological and clinical research. Sequencing is widely used, and generates very large amounts of data, mainly due to reduced cost and advanced technologies. Quickly assessing the quality of giga-to-tera base levels of sequencing data has become a routine but important task. Identification and elimination of low-quality sequence data is crucial for reliability of downstream analysis results. There is a need for a high-speed tool that uses optimized parallel programming for batch processing and simply gauges the quality of sequencing data from multiple datasets independent of any other processing steps. RESULTS: FQStat is a stand-alone, platform-independent software tool that assesses the quality of FASTQ files using parallel programming. Based on the machine architecture and input data, FQStat automatically determines the number of cores and the amount of memory to be allocated per file for optimum performance. Our results indicate that in a core-limited case, core assignment overhead exceeds the benefit of additional cores. In a core-unlimited case, there is a saturation point reached in performance by increasingly assigning additional cores per file. We also show that memory allocation per file has a lower priority in performance when compared to the allocation of cores. FQStat’s output is summarized in HTML web page, tab-delimited text file, and high-resolution image formats. FQStat calculates and plots read count, read length, quality score, and high-quality base statistics. FQStat identifies and marks low-quality sequencing data to suggest removal from downstream analysis. We applied FQStat on real sequencing data to optimize performance and to demonstrate its capabilities. We also compared FQStat’s performance to similar quality control (QC) tools that utilize parallel programming and attained improvements in run time. CONCLUSIONS: FQStat is a user-friendly tool with a graphical interface that employs a parallel programming architecture and automatically optimizes its performance to generate quality control statistics for sequencing data. Unlike existing tools, these statistics are calculated for multiple datasets and separately at the “lane,” “sample,” and “experiment” level to identify subsets of the samples with low quality, thereby preventing the loss of complete samples when reliable data can still be obtained. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-3015-y) contains supplementary material, which is available to authorized users. BioMed Central 2019-08-15 /pmc/articles/PMC6694608/ /pubmed/31416440 http://dx.doi.org/10.1186/s12859-019-3015-y Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Software Chanumolu, Sree K. Albahrani, Mustafa Otu, Hasan H. FQStat: a parallel architecture for very high-speed assessment of sequencing quality metrics
title	FQStat: a parallel architecture for very high-speed assessment of sequencing quality metrics
title_full	FQStat: a parallel architecture for very high-speed assessment of sequencing quality metrics
title_fullStr	FQStat: a parallel architecture for very high-speed assessment of sequencing quality metrics
title_full_unstemmed	FQStat: a parallel architecture for very high-speed assessment of sequencing quality metrics
title_short	FQStat: a parallel architecture for very high-speed assessment of sequencing quality metrics
title_sort	fqstat: a parallel architecture for very high-speed assessment of sequencing quality metrics
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6694608/ https://www.ncbi.nlm.nih.gov/pubmed/31416440 http://dx.doi.org/10.1186/s12859-019-3015-y
work_keys_str_mv	AT chanumolusreek fqstataparallelarchitectureforveryhighspeedassessmentofsequencingqualitymetrics AT albahranimustafa fqstataparallelarchitectureforveryhighspeedassessmentofsequencingqualitymetrics AT otuhasanh fqstataparallelarchitectureforveryhighspeedassessmentofsequencingqualitymetrics

FQStat: a parallel architecture for very high-speed assessment of sequencing quality metrics

Ejemplares similares