Cargando…

RNA-QC-chain: comprehensive and fast quality control for RNA-Seq data

BACKGROUND: RNA-Seq has become one of the most widely used applications based on next-generation sequencing technology. However, raw RNA-Seq data may have quality issues, which can significantly distort analytical results and lead to erroneous conclusions. Therefore, the raw data must be subjected t...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Qian, Su, Xiaoquan, Jing, Gongchao, Chen, Songlin, Ning, Kang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5813327/
https://www.ncbi.nlm.nih.gov/pubmed/29444661
http://dx.doi.org/10.1186/s12864-018-4503-6
_version_ 1783300170741448704
author Zhou, Qian
Su, Xiaoquan
Jing, Gongchao
Chen, Songlin
Ning, Kang
author_facet Zhou, Qian
Su, Xiaoquan
Jing, Gongchao
Chen, Songlin
Ning, Kang
author_sort Zhou, Qian
collection PubMed
description BACKGROUND: RNA-Seq has become one of the most widely used applications based on next-generation sequencing technology. However, raw RNA-Seq data may have quality issues, which can significantly distort analytical results and lead to erroneous conclusions. Therefore, the raw data must be subjected to vigorous quality control (QC) procedures before downstream analysis. Currently, an accurate and complete QC of RNA-Seq data requires of a suite of different QC tools used consecutively, which is inefficient in terms of usability, running time, file usage, and interpretability of the results. RESULTS: We developed a comprehensive, fast and easy-to-use QC pipeline for RNA-Seq data, RNA-QC-Chain, which involves three steps: (1) sequencing-quality assessment and trimming; (2) internal (ribosomal RNAs) and external (reads from foreign species) contamination filtering; (3) alignment statistics reporting (such as read number, alignment coverage, sequencing depth and pair-end read mapping information). This package was developed based on our previously reported tool for general QC of next-generation sequencing (NGS) data called QC-Chain, with extensions specifically designed for RNA-Seq data. It has several features that are not available yet in other QC tools for RNA-Seq data, such as RNA sequence trimming, automatic rRNA detection and automatic contaminating species identification. The three QC steps can run either sequentially or independently, enabling RNA-QC-Chain as a comprehensive package with high flexibility and usability. Moreover, parallel computing and optimizations are embedded in most of the QC procedures, providing a superior efficiency. The performance of RNA-QC-Chain has been evaluated with different types of datasets, including an in-house sequencing data, a semi-simulated data, and two real datasets downloaded from public database. Comparisons of RNA-QC-Chain with other QC tools have manifested its superiorities in both function versatility and processing speed. CONCLUSIONS: We present here a tool, RNA-QC-Chain, which can be used to comprehensively resolve the quality control processes of RNA-Seq data effectively and efficiently.
format Online
Article
Text
id pubmed-5813327
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-58133272018-02-16 RNA-QC-chain: comprehensive and fast quality control for RNA-Seq data Zhou, Qian Su, Xiaoquan Jing, Gongchao Chen, Songlin Ning, Kang BMC Genomics Software BACKGROUND: RNA-Seq has become one of the most widely used applications based on next-generation sequencing technology. However, raw RNA-Seq data may have quality issues, which can significantly distort analytical results and lead to erroneous conclusions. Therefore, the raw data must be subjected to vigorous quality control (QC) procedures before downstream analysis. Currently, an accurate and complete QC of RNA-Seq data requires of a suite of different QC tools used consecutively, which is inefficient in terms of usability, running time, file usage, and interpretability of the results. RESULTS: We developed a comprehensive, fast and easy-to-use QC pipeline for RNA-Seq data, RNA-QC-Chain, which involves three steps: (1) sequencing-quality assessment and trimming; (2) internal (ribosomal RNAs) and external (reads from foreign species) contamination filtering; (3) alignment statistics reporting (such as read number, alignment coverage, sequencing depth and pair-end read mapping information). This package was developed based on our previously reported tool for general QC of next-generation sequencing (NGS) data called QC-Chain, with extensions specifically designed for RNA-Seq data. It has several features that are not available yet in other QC tools for RNA-Seq data, such as RNA sequence trimming, automatic rRNA detection and automatic contaminating species identification. The three QC steps can run either sequentially or independently, enabling RNA-QC-Chain as a comprehensive package with high flexibility and usability. Moreover, parallel computing and optimizations are embedded in most of the QC procedures, providing a superior efficiency. The performance of RNA-QC-Chain has been evaluated with different types of datasets, including an in-house sequencing data, a semi-simulated data, and two real datasets downloaded from public database. Comparisons of RNA-QC-Chain with other QC tools have manifested its superiorities in both function versatility and processing speed. CONCLUSIONS: We present here a tool, RNA-QC-Chain, which can be used to comprehensively resolve the quality control processes of RNA-Seq data effectively and efficiently. BioMed Central 2018-02-14 /pmc/articles/PMC5813327/ /pubmed/29444661 http://dx.doi.org/10.1186/s12864-018-4503-6 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Zhou, Qian
Su, Xiaoquan
Jing, Gongchao
Chen, Songlin
Ning, Kang
RNA-QC-chain: comprehensive and fast quality control for RNA-Seq data
title RNA-QC-chain: comprehensive and fast quality control for RNA-Seq data
title_full RNA-QC-chain: comprehensive and fast quality control for RNA-Seq data
title_fullStr RNA-QC-chain: comprehensive and fast quality control for RNA-Seq data
title_full_unstemmed RNA-QC-chain: comprehensive and fast quality control for RNA-Seq data
title_short RNA-QC-chain: comprehensive and fast quality control for RNA-Seq data
title_sort rna-qc-chain: comprehensive and fast quality control for rna-seq data
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5813327/
https://www.ncbi.nlm.nih.gov/pubmed/29444661
http://dx.doi.org/10.1186/s12864-018-4503-6
work_keys_str_mv AT zhouqian rnaqcchaincomprehensiveandfastqualitycontrolforrnaseqdata
AT suxiaoquan rnaqcchaincomprehensiveandfastqualitycontrolforrnaseqdata
AT jinggongchao rnaqcchaincomprehensiveandfastqualitycontrolforrnaseqdata
AT chensonglin rnaqcchaincomprehensiveandfastqualitycontrolforrnaseqdata
AT ningkang rnaqcchaincomprehensiveandfastqualitycontrolforrnaseqdata