Cargando…

Filtration and Normalization of Sequencing Read Data in Whole-Metagenome Shotgun Samples

Ever-increasing affordability of next-generation sequencing makes whole-metagenome sequencing an attractive alternative to traditional 16S rDNA, RFLP, or culturing approaches for the analysis of microbiome samples. The advantage of whole-metagenome sequencing is that it allows direct inference of th...

Descripción completa

Detalles Bibliográficos
Autores principales: Chouvarine, Philippe, Wiehlmann, Lutz, Moran Losada, Patricia, DeLuca, David S., Tümmler, Burkhard
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5070866/
https://www.ncbi.nlm.nih.gov/pubmed/27760173
http://dx.doi.org/10.1371/journal.pone.0165015
_version_ 1782461213452533760
author Chouvarine, Philippe
Wiehlmann, Lutz
Moran Losada, Patricia
DeLuca, David S.
Tümmler, Burkhard
author_facet Chouvarine, Philippe
Wiehlmann, Lutz
Moran Losada, Patricia
DeLuca, David S.
Tümmler, Burkhard
author_sort Chouvarine, Philippe
collection PubMed
description Ever-increasing affordability of next-generation sequencing makes whole-metagenome sequencing an attractive alternative to traditional 16S rDNA, RFLP, or culturing approaches for the analysis of microbiome samples. The advantage of whole-metagenome sequencing is that it allows direct inference of the metabolic capacity and physiological features of the studied metagenome without reliance on the knowledge of genotypes and phenotypes of the members of the bacterial community. It also makes it possible to overcome problems of 16S rDNA sequencing, such as unknown copy number of the 16S gene and lack of sufficient sequence similarity of the “universal” 16S primers to some of the target 16S genes. On the other hand, next-generation sequencing suffers from biases resulting in non-uniform coverage of the sequenced genomes. To overcome this difficulty, we present a model of GC-bias in sequencing metagenomic samples as well as filtration and normalization techniques necessary for accurate quantification of microbial organisms. While there has been substantial research in normalization and filtration of read-count data in such techniques as RNA-seq or Chip-seq, to our knowledge, this has not been the case for the field of whole-metagenome shotgun sequencing. The presented methods assume that complete genome references are available for most microorganisms of interest present in metagenomic samples. This is often a valid assumption in such fields as medical diagnostics of patient microbiota. Testing the model on two validation datasets showed four-fold reduction in root-mean-square error compared to non-normalized data in both cases. The presented methods can be applied to any pipeline for whole metagenome sequencing analysis relying on complete microbial genome references. We demonstrate that such pre-processing reduces the number of false positive hits and increases accuracy of abundance estimates.
format Online
Article
Text
id pubmed-5070866
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-50708662016-10-27 Filtration and Normalization of Sequencing Read Data in Whole-Metagenome Shotgun Samples Chouvarine, Philippe Wiehlmann, Lutz Moran Losada, Patricia DeLuca, David S. Tümmler, Burkhard PLoS One Research Article Ever-increasing affordability of next-generation sequencing makes whole-metagenome sequencing an attractive alternative to traditional 16S rDNA, RFLP, or culturing approaches for the analysis of microbiome samples. The advantage of whole-metagenome sequencing is that it allows direct inference of the metabolic capacity and physiological features of the studied metagenome without reliance on the knowledge of genotypes and phenotypes of the members of the bacterial community. It also makes it possible to overcome problems of 16S rDNA sequencing, such as unknown copy number of the 16S gene and lack of sufficient sequence similarity of the “universal” 16S primers to some of the target 16S genes. On the other hand, next-generation sequencing suffers from biases resulting in non-uniform coverage of the sequenced genomes. To overcome this difficulty, we present a model of GC-bias in sequencing metagenomic samples as well as filtration and normalization techniques necessary for accurate quantification of microbial organisms. While there has been substantial research in normalization and filtration of read-count data in such techniques as RNA-seq or Chip-seq, to our knowledge, this has not been the case for the field of whole-metagenome shotgun sequencing. The presented methods assume that complete genome references are available for most microorganisms of interest present in metagenomic samples. This is often a valid assumption in such fields as medical diagnostics of patient microbiota. Testing the model on two validation datasets showed four-fold reduction in root-mean-square error compared to non-normalized data in both cases. The presented methods can be applied to any pipeline for whole metagenome sequencing analysis relying on complete microbial genome references. We demonstrate that such pre-processing reduces the number of false positive hits and increases accuracy of abundance estimates. Public Library of Science 2016-10-19 /pmc/articles/PMC5070866/ /pubmed/27760173 http://dx.doi.org/10.1371/journal.pone.0165015 Text en © 2016 Chouvarine et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Chouvarine, Philippe
Wiehlmann, Lutz
Moran Losada, Patricia
DeLuca, David S.
Tümmler, Burkhard
Filtration and Normalization of Sequencing Read Data in Whole-Metagenome Shotgun Samples
title Filtration and Normalization of Sequencing Read Data in Whole-Metagenome Shotgun Samples
title_full Filtration and Normalization of Sequencing Read Data in Whole-Metagenome Shotgun Samples
title_fullStr Filtration and Normalization of Sequencing Read Data in Whole-Metagenome Shotgun Samples
title_full_unstemmed Filtration and Normalization of Sequencing Read Data in Whole-Metagenome Shotgun Samples
title_short Filtration and Normalization of Sequencing Read Data in Whole-Metagenome Shotgun Samples
title_sort filtration and normalization of sequencing read data in whole-metagenome shotgun samples
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5070866/
https://www.ncbi.nlm.nih.gov/pubmed/27760173
http://dx.doi.org/10.1371/journal.pone.0165015
work_keys_str_mv AT chouvarinephilippe filtrationandnormalizationofsequencingreaddatainwholemetagenomeshotgunsamples
AT wiehlmannlutz filtrationandnormalizationofsequencingreaddatainwholemetagenomeshotgunsamples
AT moranlosadapatricia filtrationandnormalizationofsequencingreaddatainwholemetagenomeshotgunsamples
AT delucadavids filtrationandnormalizationofsequencingreaddatainwholemetagenomeshotgunsamples
AT tummlerburkhard filtrationandnormalizationofsequencingreaddatainwholemetagenomeshotgunsamples