Cargando…

Assessment of quality control approaches for metagenomic data analysis

Currently there is an explosive increase of the next-generation sequencing (NGS) projects and related datasets, which have to be processed by Quality Control (QC) procedures before they could be utilized for omics analysis. QC procedure usually includes identification and filtration of sequencing ar...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Qian, Su, Xiaoquan, Ning, Kang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4223665/
https://www.ncbi.nlm.nih.gov/pubmed/25376098
http://dx.doi.org/10.1038/srep06957
_version_ 1782343238810599424
author Zhou, Qian
Su, Xiaoquan
Ning, Kang
author_facet Zhou, Qian
Su, Xiaoquan
Ning, Kang
author_sort Zhou, Qian
collection PubMed
description Currently there is an explosive increase of the next-generation sequencing (NGS) projects and related datasets, which have to be processed by Quality Control (QC) procedures before they could be utilized for omics analysis. QC procedure usually includes identification and filtration of sequencing artifacts such as low-quality reads and contaminating reads, which would significantly affect and sometimes mislead downstream analysis. Quality control of NGS data for microbial communities is especially challenging. In this work, we have evaluated and compared the performance and effects of various QC pipelines on different types of metagenomic NGS data and from different angles, based on which general principles of using QC pipelines were proposed. Results based on both simulated and real metagenomic datasets have shown that: firstly, QC-Chain is superior in its ability for contamination identification for metagenomic NGS datasets with different complexities with high sensitivity and specificity. Secondly, the high performance computing engine enabled QC-Chain to achieve a significant reduction in processing time compared to other pipelines based on serial computing. Thirdly, QC-Chain could outperform other tools in benefiting downstream metagenomic data analysis.
format Online
Article
Text
id pubmed-4223665
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-42236652014-11-13 Assessment of quality control approaches for metagenomic data analysis Zhou, Qian Su, Xiaoquan Ning, Kang Sci Rep Article Currently there is an explosive increase of the next-generation sequencing (NGS) projects and related datasets, which have to be processed by Quality Control (QC) procedures before they could be utilized for omics analysis. QC procedure usually includes identification and filtration of sequencing artifacts such as low-quality reads and contaminating reads, which would significantly affect and sometimes mislead downstream analysis. Quality control of NGS data for microbial communities is especially challenging. In this work, we have evaluated and compared the performance and effects of various QC pipelines on different types of metagenomic NGS data and from different angles, based on which general principles of using QC pipelines were proposed. Results based on both simulated and real metagenomic datasets have shown that: firstly, QC-Chain is superior in its ability for contamination identification for metagenomic NGS datasets with different complexities with high sensitivity and specificity. Secondly, the high performance computing engine enabled QC-Chain to achieve a significant reduction in processing time compared to other pipelines based on serial computing. Thirdly, QC-Chain could outperform other tools in benefiting downstream metagenomic data analysis. Nature Publishing Group 2014-11-07 /pmc/articles/PMC4223665/ /pubmed/25376098 http://dx.doi.org/10.1038/srep06957 Text en Copyright © 2014, Macmillan Publishers Limited. All rights reserved http://creativecommons.org/licenses/by-nc-sa/4.0/ This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder in order to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/
spellingShingle Article
Zhou, Qian
Su, Xiaoquan
Ning, Kang
Assessment of quality control approaches for metagenomic data analysis
title Assessment of quality control approaches for metagenomic data analysis
title_full Assessment of quality control approaches for metagenomic data analysis
title_fullStr Assessment of quality control approaches for metagenomic data analysis
title_full_unstemmed Assessment of quality control approaches for metagenomic data analysis
title_short Assessment of quality control approaches for metagenomic data analysis
title_sort assessment of quality control approaches for metagenomic data analysis
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4223665/
https://www.ncbi.nlm.nih.gov/pubmed/25376098
http://dx.doi.org/10.1038/srep06957
work_keys_str_mv AT zhouqian assessmentofqualitycontrolapproachesformetagenomicdataanalysis
AT suxiaoquan assessmentofqualitycontrolapproachesformetagenomicdataanalysis
AT ningkang assessmentofqualitycontrolapproachesformetagenomicdataanalysis