Cargando…

Rapid evaluation and quality control of next generation sequencing data with FaQCs

BACKGROUND: Next generation sequencing (NGS) technologies that parallelize the sequencing process and produce thousands to millions, or even hundreds of millions of sequences in a single sequencing run, have revolutionized genomic and genetic research. Because of the vagaries of any platform’s seque...

Descripción completa

Detalles Bibliográficos
Autores principales: Lo, Chien-Chi, Chain, Patrick S G
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4246454/
https://www.ncbi.nlm.nih.gov/pubmed/25408143
http://dx.doi.org/10.1186/s12859-014-0366-2
_version_ 1782346517318729728
author Lo, Chien-Chi
Chain, Patrick S G
author_facet Lo, Chien-Chi
Chain, Patrick S G
author_sort Lo, Chien-Chi
collection PubMed
description BACKGROUND: Next generation sequencing (NGS) technologies that parallelize the sequencing process and produce thousands to millions, or even hundreds of millions of sequences in a single sequencing run, have revolutionized genomic and genetic research. Because of the vagaries of any platform’s sequencing chemistry, the experimental processing, machine failure, and so on, the quality of sequencing reads is never perfect, and often declines as the read is extended. These errors invariably affect downstream analysis/application and should therefore be identified early on to mitigate any unforeseen effects. RESULTS: Here we present a novel FastQ Quality Control Software (FaQCs) that can rapidly process large volumes of data, and which improves upon previous solutions to monitor the quality and remove poor quality data from sequencing runs. Both the speed of processing and the memory footprint of storing all required information have been optimized via algorithmic and parallel processing solutions. The trimmed output compared side-by-side with the original data is part of the automated PDF output. We show how this tool can help data analysis by providing a few examples, including an increased percentage of reads recruited to references, improved single nucleotide polymorphism identification as well as de novo sequence assembly metrics. CONCLUSION: FaQCs combines several features of currently available applications into a single, user-friendly process, and includes additional unique capabilities such as filtering the PhiX control sequences, conversion of FASTQ formats, and multi-threading. The original data and trimmed summaries are reported within a variety of graphics and reports, providing a simple way to do data quality control and assurance. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0366-2) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4246454
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42464542014-12-02 Rapid evaluation and quality control of next generation sequencing data with FaQCs Lo, Chien-Chi Chain, Patrick S G BMC Bioinformatics Software BACKGROUND: Next generation sequencing (NGS) technologies that parallelize the sequencing process and produce thousands to millions, or even hundreds of millions of sequences in a single sequencing run, have revolutionized genomic and genetic research. Because of the vagaries of any platform’s sequencing chemistry, the experimental processing, machine failure, and so on, the quality of sequencing reads is never perfect, and often declines as the read is extended. These errors invariably affect downstream analysis/application and should therefore be identified early on to mitigate any unforeseen effects. RESULTS: Here we present a novel FastQ Quality Control Software (FaQCs) that can rapidly process large volumes of data, and which improves upon previous solutions to monitor the quality and remove poor quality data from sequencing runs. Both the speed of processing and the memory footprint of storing all required information have been optimized via algorithmic and parallel processing solutions. The trimmed output compared side-by-side with the original data is part of the automated PDF output. We show how this tool can help data analysis by providing a few examples, including an increased percentage of reads recruited to references, improved single nucleotide polymorphism identification as well as de novo sequence assembly metrics. CONCLUSION: FaQCs combines several features of currently available applications into a single, user-friendly process, and includes additional unique capabilities such as filtering the PhiX control sequences, conversion of FASTQ formats, and multi-threading. The original data and trimmed summaries are reported within a variety of graphics and reports, providing a simple way to do data quality control and assurance. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0366-2) contains supplementary material, which is available to authorized users. BioMed Central 2014-11-19 /pmc/articles/PMC4246454/ /pubmed/25408143 http://dx.doi.org/10.1186/s12859-014-0366-2 Text en © Lo and Chain; licensee BioMed Central Ltd. 2014 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Lo, Chien-Chi
Chain, Patrick S G
Rapid evaluation and quality control of next generation sequencing data with FaQCs
title Rapid evaluation and quality control of next generation sequencing data with FaQCs
title_full Rapid evaluation and quality control of next generation sequencing data with FaQCs
title_fullStr Rapid evaluation and quality control of next generation sequencing data with FaQCs
title_full_unstemmed Rapid evaluation and quality control of next generation sequencing data with FaQCs
title_short Rapid evaluation and quality control of next generation sequencing data with FaQCs
title_sort rapid evaluation and quality control of next generation sequencing data with faqcs
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4246454/
https://www.ncbi.nlm.nih.gov/pubmed/25408143
http://dx.doi.org/10.1186/s12859-014-0366-2
work_keys_str_mv AT lochienchi rapidevaluationandqualitycontrolofnextgenerationsequencingdatawithfaqcs
AT chainpatricksg rapidevaluationandqualitycontrolofnextgenerationsequencingdatawithfaqcs