Cargando…
AfterQC: automatic filtering, trimming, error removing and quality control for fastq data
BACKGROUND: Some applications, especially those clinical applications requiring high accuracy of sequencing data, usually have to face the troubles caused by unavoidable sequencing errors. Several tools have been proposed to profile the sequencing quality, but few of them can quantify or correct the...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374548/ https://www.ncbi.nlm.nih.gov/pubmed/28361673 http://dx.doi.org/10.1186/s12859-017-1469-3 |
_version_ | 1782518909555965952 |
---|---|
author | Chen, Shifu Huang, Tanxiao Zhou, Yanqing Han, Yue Xu, Mingyan Gu, Jia |
author_facet | Chen, Shifu Huang, Tanxiao Zhou, Yanqing Han, Yue Xu, Mingyan Gu, Jia |
author_sort | Chen, Shifu |
collection | PubMed |
description | BACKGROUND: Some applications, especially those clinical applications requiring high accuracy of sequencing data, usually have to face the troubles caused by unavoidable sequencing errors. Several tools have been proposed to profile the sequencing quality, but few of them can quantify or correct the sequencing errors. This unmet requirement motivated us to develop AfterQC, a tool with functions to profile sequencing errors and correct most of them, plus highly automated quality control and data filtering features. Different from most tools, AfterQC analyses the overlapping of paired sequences for pair-end sequencing data. Based on overlapping analysis, AfterQC can detect and cut adapters, and furthermore it gives a novel function to correct wrong bases in the overlapping regions. Another new feature is to detect and visualise sequencing bubbles, which can be commonly found on the flowcell lanes and may raise sequencing errors. Besides normal per cycle quality and base content plotting, AfterQC also provides features like polyX (a long sub-sequence of a same base X) filtering, automatic trimming and K-MER based strand bias profiling. RESULTS: For each single or pair of FastQ files, AfterQC filters out bad reads, detects and eliminates sequencer’s bubble effects, trims reads at front and tail, detects the sequencing errors and corrects part of them, and finally outputs clean data and generates HTML reports with interactive figures. AfterQC can run in batch mode with multiprocess support, it can run with a single FastQ file, a single pair of FastQ files (for pair-end sequencing), or a folder for all included FastQ files to be processed automatically. Based on overlapping analysis, AfterQC can estimate the sequencing error rate and profile the error transform distribution. The results of our error profiling tests show that the error distribution is highly platform dependent. CONCLUSION: Much more than just another new quality control (QC) tool, AfterQC is able to perform quality control, data filtering, error profiling and base correction automatically. Experimental results show that AfterQC can help to eliminate the sequencing errors for pair-end sequencing data to provide much cleaner outputs, and consequently help to reduce the false-positive variants, especially for the low-frequency somatic mutations. While providing rich configurable options, AfterQC can detect and set all the options automatically and require no argument in most cases. |
format | Online Article Text |
id | pubmed-5374548 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-53745482017-03-31 AfterQC: automatic filtering, trimming, error removing and quality control for fastq data Chen, Shifu Huang, Tanxiao Zhou, Yanqing Han, Yue Xu, Mingyan Gu, Jia BMC Bioinformatics Research BACKGROUND: Some applications, especially those clinical applications requiring high accuracy of sequencing data, usually have to face the troubles caused by unavoidable sequencing errors. Several tools have been proposed to profile the sequencing quality, but few of them can quantify or correct the sequencing errors. This unmet requirement motivated us to develop AfterQC, a tool with functions to profile sequencing errors and correct most of them, plus highly automated quality control and data filtering features. Different from most tools, AfterQC analyses the overlapping of paired sequences for pair-end sequencing data. Based on overlapping analysis, AfterQC can detect and cut adapters, and furthermore it gives a novel function to correct wrong bases in the overlapping regions. Another new feature is to detect and visualise sequencing bubbles, which can be commonly found on the flowcell lanes and may raise sequencing errors. Besides normal per cycle quality and base content plotting, AfterQC also provides features like polyX (a long sub-sequence of a same base X) filtering, automatic trimming and K-MER based strand bias profiling. RESULTS: For each single or pair of FastQ files, AfterQC filters out bad reads, detects and eliminates sequencer’s bubble effects, trims reads at front and tail, detects the sequencing errors and corrects part of them, and finally outputs clean data and generates HTML reports with interactive figures. AfterQC can run in batch mode with multiprocess support, it can run with a single FastQ file, a single pair of FastQ files (for pair-end sequencing), or a folder for all included FastQ files to be processed automatically. Based on overlapping analysis, AfterQC can estimate the sequencing error rate and profile the error transform distribution. The results of our error profiling tests show that the error distribution is highly platform dependent. CONCLUSION: Much more than just another new quality control (QC) tool, AfterQC is able to perform quality control, data filtering, error profiling and base correction automatically. Experimental results show that AfterQC can help to eliminate the sequencing errors for pair-end sequencing data to provide much cleaner outputs, and consequently help to reduce the false-positive variants, especially for the low-frequency somatic mutations. While providing rich configurable options, AfterQC can detect and set all the options automatically and require no argument in most cases. BioMed Central 2017-03-14 /pmc/articles/PMC5374548/ /pubmed/28361673 http://dx.doi.org/10.1186/s12859-017-1469-3 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Chen, Shifu Huang, Tanxiao Zhou, Yanqing Han, Yue Xu, Mingyan Gu, Jia AfterQC: automatic filtering, trimming, error removing and quality control for fastq data |
title | AfterQC: automatic filtering, trimming, error removing and quality control for fastq data |
title_full | AfterQC: automatic filtering, trimming, error removing and quality control for fastq data |
title_fullStr | AfterQC: automatic filtering, trimming, error removing and quality control for fastq data |
title_full_unstemmed | AfterQC: automatic filtering, trimming, error removing and quality control for fastq data |
title_short | AfterQC: automatic filtering, trimming, error removing and quality control for fastq data |
title_sort | afterqc: automatic filtering, trimming, error removing and quality control for fastq data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374548/ https://www.ncbi.nlm.nih.gov/pubmed/28361673 http://dx.doi.org/10.1186/s12859-017-1469-3 |
work_keys_str_mv | AT chenshifu afterqcautomaticfilteringtrimmingerrorremovingandqualitycontrolforfastqdata AT huangtanxiao afterqcautomaticfilteringtrimmingerrorremovingandqualitycontrolforfastqdata AT zhouyanqing afterqcautomaticfilteringtrimmingerrorremovingandqualitycontrolforfastqdata AT hanyue afterqcautomaticfilteringtrimmingerrorremovingandqualitycontrolforfastqdata AT xumingyan afterqcautomaticfilteringtrimmingerrorremovingandqualitycontrolforfastqdata AT gujia afterqcautomaticfilteringtrimmingerrorremovingandqualitycontrolforfastqdata |