Cargando…

QC-Chain: Fast and Holistic Quality Control Method for Next-Generation Sequencing Data

Next-generation sequencing (NGS) technologies have been widely used in life sciences. However, several kinds of sequencing artifacts, including low-quality reads and contaminating reads, were found to be quite common in raw sequencing data, which compromise downstream analysis. Therefore, quality co...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Qian, Su, Xiaoquan, Wang, Anhui, Xu, Jian, Ning, Kang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3615005/
https://www.ncbi.nlm.nih.gov/pubmed/23565205
http://dx.doi.org/10.1371/journal.pone.0060234
_version_ 1782264964779606016
author Zhou, Qian
Su, Xiaoquan
Wang, Anhui
Xu, Jian
Ning, Kang
author_facet Zhou, Qian
Su, Xiaoquan
Wang, Anhui
Xu, Jian
Ning, Kang
author_sort Zhou, Qian
collection PubMed
description Next-generation sequencing (NGS) technologies have been widely used in life sciences. However, several kinds of sequencing artifacts, including low-quality reads and contaminating reads, were found to be quite common in raw sequencing data, which compromise downstream analysis. Therefore, quality control (QC) is essential for raw NGS data. However, although a few NGS data quality control tools are publicly available, there are two limitations: First, the processing speed could not cope with the rapid increase of large data volume. Second, with respect to removing the contaminating reads, none of them could identify contaminating sources de novo, and they rely heavily on prior information of the contaminating species, which is usually not available in advance. Here we report QC-Chain, a fast, accurate and holistic NGS data quality-control method. The tool synergeticly comprised of user-friendly tools for (1) quality assessment and trimming of raw reads using Parallel-QC, a fast read processing tool; (2) identification, quantification and filtration of unknown contamination to get high-quality clean reads. It was optimized based on parallel computation, so the processing speed is significantly higher than other QC methods. Experiments on simulated and real NGS data have shown that reads with low sequencing quality could be identified and filtered. Possible contaminating sources could be identified and quantified de novo, accurately and quickly. Comparison between raw reads and processed reads also showed that subsequent analyses (genome assembly, gene prediction, gene annotation, etc.) results based on processed reads improved significantly in completeness and accuracy. As regard to processing speed, QC-Chain achieves 7–8 time speed-up based on parallel computation as compared to traditional methods. Therefore, QC-Chain is a fast and useful quality control tool for read quality process and de novo contamination filtration of NGS reads, which could significantly facilitate downstream analysis. QC-Chain is publicly available at: http://www.computationalbioenergy.org/qc-chain.html.
format Online
Article
Text
id pubmed-3615005
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-36150052013-04-05 QC-Chain: Fast and Holistic Quality Control Method for Next-Generation Sequencing Data Zhou, Qian Su, Xiaoquan Wang, Anhui Xu, Jian Ning, Kang PLoS One Research Article Next-generation sequencing (NGS) technologies have been widely used in life sciences. However, several kinds of sequencing artifacts, including low-quality reads and contaminating reads, were found to be quite common in raw sequencing data, which compromise downstream analysis. Therefore, quality control (QC) is essential for raw NGS data. However, although a few NGS data quality control tools are publicly available, there are two limitations: First, the processing speed could not cope with the rapid increase of large data volume. Second, with respect to removing the contaminating reads, none of them could identify contaminating sources de novo, and they rely heavily on prior information of the contaminating species, which is usually not available in advance. Here we report QC-Chain, a fast, accurate and holistic NGS data quality-control method. The tool synergeticly comprised of user-friendly tools for (1) quality assessment and trimming of raw reads using Parallel-QC, a fast read processing tool; (2) identification, quantification and filtration of unknown contamination to get high-quality clean reads. It was optimized based on parallel computation, so the processing speed is significantly higher than other QC methods. Experiments on simulated and real NGS data have shown that reads with low sequencing quality could be identified and filtered. Possible contaminating sources could be identified and quantified de novo, accurately and quickly. Comparison between raw reads and processed reads also showed that subsequent analyses (genome assembly, gene prediction, gene annotation, etc.) results based on processed reads improved significantly in completeness and accuracy. As regard to processing speed, QC-Chain achieves 7–8 time speed-up based on parallel computation as compared to traditional methods. Therefore, QC-Chain is a fast and useful quality control tool for read quality process and de novo contamination filtration of NGS reads, which could significantly facilitate downstream analysis. QC-Chain is publicly available at: http://www.computationalbioenergy.org/qc-chain.html. Public Library of Science 2013-04-02 /pmc/articles/PMC3615005/ /pubmed/23565205 http://dx.doi.org/10.1371/journal.pone.0060234 Text en © 2013 Zhou et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Zhou, Qian
Su, Xiaoquan
Wang, Anhui
Xu, Jian
Ning, Kang
QC-Chain: Fast and Holistic Quality Control Method for Next-Generation Sequencing Data
title QC-Chain: Fast and Holistic Quality Control Method for Next-Generation Sequencing Data
title_full QC-Chain: Fast and Holistic Quality Control Method for Next-Generation Sequencing Data
title_fullStr QC-Chain: Fast and Holistic Quality Control Method for Next-Generation Sequencing Data
title_full_unstemmed QC-Chain: Fast and Holistic Quality Control Method for Next-Generation Sequencing Data
title_short QC-Chain: Fast and Holistic Quality Control Method for Next-Generation Sequencing Data
title_sort qc-chain: fast and holistic quality control method for next-generation sequencing data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3615005/
https://www.ncbi.nlm.nih.gov/pubmed/23565205
http://dx.doi.org/10.1371/journal.pone.0060234
work_keys_str_mv AT zhouqian qcchainfastandholisticqualitycontrolmethodfornextgenerationsequencingdata
AT suxiaoquan qcchainfastandholisticqualitycontrolmethodfornextgenerationsequencingdata
AT wanganhui qcchainfastandholisticqualitycontrolmethodfornextgenerationsequencingdata
AT xujian qcchainfastandholisticqualitycontrolmethodfornextgenerationsequencingdata
AT ningkang qcchainfastandholisticqualitycontrolmethodfornextgenerationsequencingdata