Cargando…
Using QC-Blind for Quality Control and Contamination Screening of Bacteria DNA Sequencing Data Without Reference Genome
Quality control for next generation sequencing (NGS) has become increasingly important with the ever increasing importance of sequencing data for omics studies. Tools have been developed for filtering possible contaminants from species with known reference genome. Unfortunately, reference genomes fo...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6637319/ https://www.ncbi.nlm.nih.gov/pubmed/31354662 http://dx.doi.org/10.3389/fmicb.2019.01560 |
_version_ | 1783436219343962112 |
---|---|
author | Xi, Wang Gao, Yan Cheng, Zhangyu Chen, Chaoyun Han, Maozhen Yang, Pengshuo Xiong, Guangzhou Ning, Kang |
author_facet | Xi, Wang Gao, Yan Cheng, Zhangyu Chen, Chaoyun Han, Maozhen Yang, Pengshuo Xiong, Guangzhou Ning, Kang |
author_sort | Xi, Wang |
collection | PubMed |
description | Quality control for next generation sequencing (NGS) has become increasingly important with the ever increasing importance of sequencing data for omics studies. Tools have been developed for filtering possible contaminants from species with known reference genome. Unfortunately, reference genomes for all the species involved, including the contaminants, are required for these tools to work. This precludes many real-life samples that have no information about the complete genome of the target species, and are contaminated with unknown microbial species. In this work we proposed QC-Blind, a novel quality control pipeline for removing contaminants without any use of reference genomes. The pipeline merely requires the information about a few marker genes of the target species. The entire pipeline consists of unsupervised read assembly, contig binning, read clustering, and marker gene assignment. When evaluated on in silico, ab initio and in vivo datasets, QC-Blind proved effective in removing unknown contaminants with high specificity and accuracy, while preserving most of the genomic information of the target bacterial species. Therefore, QC-Blind could serve well in situations where limited information is available for both target and contamination species. |
format | Online Article Text |
id | pubmed-6637319 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-66373192019-07-26 Using QC-Blind for Quality Control and Contamination Screening of Bacteria DNA Sequencing Data Without Reference Genome Xi, Wang Gao, Yan Cheng, Zhangyu Chen, Chaoyun Han, Maozhen Yang, Pengshuo Xiong, Guangzhou Ning, Kang Front Microbiol Microbiology Quality control for next generation sequencing (NGS) has become increasingly important with the ever increasing importance of sequencing data for omics studies. Tools have been developed for filtering possible contaminants from species with known reference genome. Unfortunately, reference genomes for all the species involved, including the contaminants, are required for these tools to work. This precludes many real-life samples that have no information about the complete genome of the target species, and are contaminated with unknown microbial species. In this work we proposed QC-Blind, a novel quality control pipeline for removing contaminants without any use of reference genomes. The pipeline merely requires the information about a few marker genes of the target species. The entire pipeline consists of unsupervised read assembly, contig binning, read clustering, and marker gene assignment. When evaluated on in silico, ab initio and in vivo datasets, QC-Blind proved effective in removing unknown contaminants with high specificity and accuracy, while preserving most of the genomic information of the target bacterial species. Therefore, QC-Blind could serve well in situations where limited information is available for both target and contamination species. Frontiers Media S.A. 2019-07-09 /pmc/articles/PMC6637319/ /pubmed/31354662 http://dx.doi.org/10.3389/fmicb.2019.01560 Text en Copyright © 2019 Xi, Gao, Cheng, Chen, Han, Yang, Xiong and Ning. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Microbiology Xi, Wang Gao, Yan Cheng, Zhangyu Chen, Chaoyun Han, Maozhen Yang, Pengshuo Xiong, Guangzhou Ning, Kang Using QC-Blind for Quality Control and Contamination Screening of Bacteria DNA Sequencing Data Without Reference Genome |
title | Using QC-Blind for Quality Control and Contamination Screening of Bacteria DNA Sequencing Data Without Reference Genome |
title_full | Using QC-Blind for Quality Control and Contamination Screening of Bacteria DNA Sequencing Data Without Reference Genome |
title_fullStr | Using QC-Blind for Quality Control and Contamination Screening of Bacteria DNA Sequencing Data Without Reference Genome |
title_full_unstemmed | Using QC-Blind for Quality Control and Contamination Screening of Bacteria DNA Sequencing Data Without Reference Genome |
title_short | Using QC-Blind for Quality Control and Contamination Screening of Bacteria DNA Sequencing Data Without Reference Genome |
title_sort | using qc-blind for quality control and contamination screening of bacteria dna sequencing data without reference genome |
topic | Microbiology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6637319/ https://www.ncbi.nlm.nih.gov/pubmed/31354662 http://dx.doi.org/10.3389/fmicb.2019.01560 |
work_keys_str_mv | AT xiwang usingqcblindforqualitycontrolandcontaminationscreeningofbacteriadnasequencingdatawithoutreferencegenome AT gaoyan usingqcblindforqualitycontrolandcontaminationscreeningofbacteriadnasequencingdatawithoutreferencegenome AT chengzhangyu usingqcblindforqualitycontrolandcontaminationscreeningofbacteriadnasequencingdatawithoutreferencegenome AT chenchaoyun usingqcblindforqualitycontrolandcontaminationscreeningofbacteriadnasequencingdatawithoutreferencegenome AT hanmaozhen usingqcblindforqualitycontrolandcontaminationscreeningofbacteriadnasequencingdatawithoutreferencegenome AT yangpengshuo usingqcblindforqualitycontrolandcontaminationscreeningofbacteriadnasequencingdatawithoutreferencegenome AT xiongguangzhou usingqcblindforqualitycontrolandcontaminationscreeningofbacteriadnasequencingdatawithoutreferencegenome AT ningkang usingqcblindforqualitycontrolandcontaminationscreeningofbacteriadnasequencingdatawithoutreferencegenome |