Cargando…

SAG-QC: quality control of single amplified genome information by subtracting non-target sequences based on sequence compositions

BACKGROUND: Whole genome amplification techniques have enabled the analysis of unexplored genomic information by sequencing of single-amplified genomes (SAGs). Whole genome amplification of single bacteria is currently challenging because contamination often occurs in experimental processes. Thus, t...

Descripción completa

Detalles Bibliográficos
Autores principales: Maruyama, Toru, Mori, Tetsushi, Yamagishi, Keisuke, Takeyama, Haruko
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5336615/
https://www.ncbi.nlm.nih.gov/pubmed/28259144
http://dx.doi.org/10.1186/s12859-017-1572-5
_version_ 1782512222794153984
author Maruyama, Toru
Mori, Tetsushi
Yamagishi, Keisuke
Takeyama, Haruko
author_facet Maruyama, Toru
Mori, Tetsushi
Yamagishi, Keisuke
Takeyama, Haruko
author_sort Maruyama, Toru
collection PubMed
description BACKGROUND: Whole genome amplification techniques have enabled the analysis of unexplored genomic information by sequencing of single-amplified genomes (SAGs). Whole genome amplification of single bacteria is currently challenging because contamination often occurs in experimental processes. Thus, to increase the confidence in the analyses of sequenced SAGs, bioinformatics approaches that identify and exclude non-target sequences from SAGs are required. Since currently reported approaches utilize sequence information in public databases, they have limitations when new strains are the targets of interest. Here, we developed a software SAG-QC that identify and exclude non-target sequences independent of database. RESULTS: In our method, “no template control” sequences acquired during WGA were used. We calculated the probability that a sequence was derived from contaminants by comparing k-mer compositions with the no template control sequences. Based on the results of tests using simulated SAG datasets, the accuracy of our method for predicting non-target sequences was higher than that of currently reported techniques. Subsequently, we applied our tool to actual SAG datasets and evaluated the accuracy of the predictions. CONCLUSIONS: Our method works independently of public sequence information for distinguishing SAGs from non-target sequences. This method will be effective when employed against SAG sequences of unexplored strains and we anticipate that it will contribute to the correct interpretation of SAGs. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1572-5) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5336615
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-53366152017-03-07 SAG-QC: quality control of single amplified genome information by subtracting non-target sequences based on sequence compositions Maruyama, Toru Mori, Tetsushi Yamagishi, Keisuke Takeyama, Haruko BMC Bioinformatics Software BACKGROUND: Whole genome amplification techniques have enabled the analysis of unexplored genomic information by sequencing of single-amplified genomes (SAGs). Whole genome amplification of single bacteria is currently challenging because contamination often occurs in experimental processes. Thus, to increase the confidence in the analyses of sequenced SAGs, bioinformatics approaches that identify and exclude non-target sequences from SAGs are required. Since currently reported approaches utilize sequence information in public databases, they have limitations when new strains are the targets of interest. Here, we developed a software SAG-QC that identify and exclude non-target sequences independent of database. RESULTS: In our method, “no template control” sequences acquired during WGA were used. We calculated the probability that a sequence was derived from contaminants by comparing k-mer compositions with the no template control sequences. Based on the results of tests using simulated SAG datasets, the accuracy of our method for predicting non-target sequences was higher than that of currently reported techniques. Subsequently, we applied our tool to actual SAG datasets and evaluated the accuracy of the predictions. CONCLUSIONS: Our method works independently of public sequence information for distinguishing SAGs from non-target sequences. This method will be effective when employed against SAG sequences of unexplored strains and we anticipate that it will contribute to the correct interpretation of SAGs. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1572-5) contains supplementary material, which is available to authorized users. BioMed Central 2017-03-04 /pmc/articles/PMC5336615/ /pubmed/28259144 http://dx.doi.org/10.1186/s12859-017-1572-5 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Maruyama, Toru
Mori, Tetsushi
Yamagishi, Keisuke
Takeyama, Haruko
SAG-QC: quality control of single amplified genome information by subtracting non-target sequences based on sequence compositions
title SAG-QC: quality control of single amplified genome information by subtracting non-target sequences based on sequence compositions
title_full SAG-QC: quality control of single amplified genome information by subtracting non-target sequences based on sequence compositions
title_fullStr SAG-QC: quality control of single amplified genome information by subtracting non-target sequences based on sequence compositions
title_full_unstemmed SAG-QC: quality control of single amplified genome information by subtracting non-target sequences based on sequence compositions
title_short SAG-QC: quality control of single amplified genome information by subtracting non-target sequences based on sequence compositions
title_sort sag-qc: quality control of single amplified genome information by subtracting non-target sequences based on sequence compositions
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5336615/
https://www.ncbi.nlm.nih.gov/pubmed/28259144
http://dx.doi.org/10.1186/s12859-017-1572-5
work_keys_str_mv AT maruyamatoru sagqcqualitycontrolofsingleamplifiedgenomeinformationbysubtractingnontargetsequencesbasedonsequencecompositions
AT moritetsushi sagqcqualitycontrolofsingleamplifiedgenomeinformationbysubtractingnontargetsequencesbasedonsequencecompositions
AT yamagishikeisuke sagqcqualitycontrolofsingleamplifiedgenomeinformationbysubtractingnontargetsequencesbasedonsequencecompositions
AT takeyamaharuko sagqcqualitycontrolofsingleamplifiedgenomeinformationbysubtractingnontargetsequencesbasedonsequencecompositions