Cargando…

Using QC-Blind for Quality Control and Contamination Screening of Bacteria DNA Sequencing Data Without Reference Genome

Quality control for next generation sequencing (NGS) has become increasingly important with the ever increasing importance of sequencing data for omics studies. Tools have been developed for filtering possible contaminants from species with known reference genome. Unfortunately, reference genomes fo...

Descripción completa

Detalles Bibliográficos
Autores principales: Xi, Wang, Gao, Yan, Cheng, Zhangyu, Chen, Chaoyun, Han, Maozhen, Yang, Pengshuo, Xiong, Guangzhou, Ning, Kang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6637319/
https://www.ncbi.nlm.nih.gov/pubmed/31354662
http://dx.doi.org/10.3389/fmicb.2019.01560
_version_ 1783436219343962112
author Xi, Wang
Gao, Yan
Cheng, Zhangyu
Chen, Chaoyun
Han, Maozhen
Yang, Pengshuo
Xiong, Guangzhou
Ning, Kang
author_facet Xi, Wang
Gao, Yan
Cheng, Zhangyu
Chen, Chaoyun
Han, Maozhen
Yang, Pengshuo
Xiong, Guangzhou
Ning, Kang
author_sort Xi, Wang
collection PubMed
description Quality control for next generation sequencing (NGS) has become increasingly important with the ever increasing importance of sequencing data for omics studies. Tools have been developed for filtering possible contaminants from species with known reference genome. Unfortunately, reference genomes for all the species involved, including the contaminants, are required for these tools to work. This precludes many real-life samples that have no information about the complete genome of the target species, and are contaminated with unknown microbial species. In this work we proposed QC-Blind, a novel quality control pipeline for removing contaminants without any use of reference genomes. The pipeline merely requires the information about a few marker genes of the target species. The entire pipeline consists of unsupervised read assembly, contig binning, read clustering, and marker gene assignment. When evaluated on in silico, ab initio and in vivo datasets, QC-Blind proved effective in removing unknown contaminants with high specificity and accuracy, while preserving most of the genomic information of the target bacterial species. Therefore, QC-Blind could serve well in situations where limited information is available for both target and contamination species.
format Online
Article
Text
id pubmed-6637319
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-66373192019-07-26 Using QC-Blind for Quality Control and Contamination Screening of Bacteria DNA Sequencing Data Without Reference Genome Xi, Wang Gao, Yan Cheng, Zhangyu Chen, Chaoyun Han, Maozhen Yang, Pengshuo Xiong, Guangzhou Ning, Kang Front Microbiol Microbiology Quality control for next generation sequencing (NGS) has become increasingly important with the ever increasing importance of sequencing data for omics studies. Tools have been developed for filtering possible contaminants from species with known reference genome. Unfortunately, reference genomes for all the species involved, including the contaminants, are required for these tools to work. This precludes many real-life samples that have no information about the complete genome of the target species, and are contaminated with unknown microbial species. In this work we proposed QC-Blind, a novel quality control pipeline for removing contaminants without any use of reference genomes. The pipeline merely requires the information about a few marker genes of the target species. The entire pipeline consists of unsupervised read assembly, contig binning, read clustering, and marker gene assignment. When evaluated on in silico, ab initio and in vivo datasets, QC-Blind proved effective in removing unknown contaminants with high specificity and accuracy, while preserving most of the genomic information of the target bacterial species. Therefore, QC-Blind could serve well in situations where limited information is available for both target and contamination species. Frontiers Media S.A. 2019-07-09 /pmc/articles/PMC6637319/ /pubmed/31354662 http://dx.doi.org/10.3389/fmicb.2019.01560 Text en Copyright © 2019 Xi, Gao, Cheng, Chen, Han, Yang, Xiong and Ning. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Xi, Wang
Gao, Yan
Cheng, Zhangyu
Chen, Chaoyun
Han, Maozhen
Yang, Pengshuo
Xiong, Guangzhou
Ning, Kang
Using QC-Blind for Quality Control and Contamination Screening of Bacteria DNA Sequencing Data Without Reference Genome
title Using QC-Blind for Quality Control and Contamination Screening of Bacteria DNA Sequencing Data Without Reference Genome
title_full Using QC-Blind for Quality Control and Contamination Screening of Bacteria DNA Sequencing Data Without Reference Genome
title_fullStr Using QC-Blind for Quality Control and Contamination Screening of Bacteria DNA Sequencing Data Without Reference Genome
title_full_unstemmed Using QC-Blind for Quality Control and Contamination Screening of Bacteria DNA Sequencing Data Without Reference Genome
title_short Using QC-Blind for Quality Control and Contamination Screening of Bacteria DNA Sequencing Data Without Reference Genome
title_sort using qc-blind for quality control and contamination screening of bacteria dna sequencing data without reference genome
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6637319/
https://www.ncbi.nlm.nih.gov/pubmed/31354662
http://dx.doi.org/10.3389/fmicb.2019.01560
work_keys_str_mv AT xiwang usingqcblindforqualitycontrolandcontaminationscreeningofbacteriadnasequencingdatawithoutreferencegenome
AT gaoyan usingqcblindforqualitycontrolandcontaminationscreeningofbacteriadnasequencingdatawithoutreferencegenome
AT chengzhangyu usingqcblindforqualitycontrolandcontaminationscreeningofbacteriadnasequencingdatawithoutreferencegenome
AT chenchaoyun usingqcblindforqualitycontrolandcontaminationscreeningofbacteriadnasequencingdatawithoutreferencegenome
AT hanmaozhen usingqcblindforqualitycontrolandcontaminationscreeningofbacteriadnasequencingdatawithoutreferencegenome
AT yangpengshuo usingqcblindforqualitycontrolandcontaminationscreeningofbacteriadnasequencingdatawithoutreferencegenome
AT xiongguangzhou usingqcblindforqualitycontrolandcontaminationscreeningofbacteriadnasequencingdatawithoutreferencegenome
AT ningkang usingqcblindforqualitycontrolandcontaminationscreeningofbacteriadnasequencingdatawithoutreferencegenome