Cargando…

Correcting for 16S rRNA gene copy numbers in microbiome surveys remains an unsolved problem

The 16S ribosomal RNA gene is the most widely used marker gene in microbial ecology. Counts of 16S sequence variants, often in PCR amplicons, are used to estimate proportions of bacterial and archaeal taxa in microbial communities. Because different organisms contain different 16S gene copy numbers...

Descripción completa

Detalles Bibliográficos
Autores principales: Louca, Stilianos, Doebeli, Michael, Parfrey, Laura Wegener
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5828423/
https://www.ncbi.nlm.nih.gov/pubmed/29482646
http://dx.doi.org/10.1186/s40168-018-0420-9
_version_ 1783302642773000192
author Louca, Stilianos
Doebeli, Michael
Parfrey, Laura Wegener
author_facet Louca, Stilianos
Doebeli, Michael
Parfrey, Laura Wegener
author_sort Louca, Stilianos
collection PubMed
description The 16S ribosomal RNA gene is the most widely used marker gene in microbial ecology. Counts of 16S sequence variants, often in PCR amplicons, are used to estimate proportions of bacterial and archaeal taxa in microbial communities. Because different organisms contain different 16S gene copy numbers (GCNs), sequence variant counts are biased towards clades with greater GCNs. Several tools have recently been developed for predicting GCNs using phylogenetic methods and based on sequenced genomes, in order to correct for these biases. However, the accuracy of those predictions has not been independently assessed. Here, we systematically evaluate the predictability of 16S GCNs across bacterial and archaeal clades, based on ∼ 6,800 public sequenced genomes and using several phylogenetic methods. Further, we assess the accuracy of GCNs predicted by three recently published tools (PICRUSt, CopyRighter, and PAPRICA) over a wide range of taxa and for 635 microbial communities from varied environments. We find that regardless of the phylogenetic method tested, 16S GCNs could only be accurately predicted for a limited fraction of taxa, namely taxa with closely to moderately related representatives (≲15% divergence in the 16S rRNA gene). Consistent with this observation, we find that all considered tools exhibit low predictive accuracy when evaluated against completely sequenced genomes, in some cases explaining less than 10% of the variance. Substantial disagreement was also observed between tools (R(2)<0.5) for the majority of tested microbial communities. The nearest sequenced taxon index (NSTI) of microbial communities, i.e., the average distance to a sequenced genome, was a strong predictor for the agreement between GCN prediction tools on non-animal-associated samples, but only a moderate predictor for animal-associated samples. We recommend against correcting for 16S GCNs in microbiome surveys by default, unless OTUs are sufficiently closely related to sequenced genomes or unless a need for true OTU proportions warrants the additional noise introduced, so that community profiles remain interpretable and comparable between studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40168-018-0420-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5828423
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-58284232018-03-01 Correcting for 16S rRNA gene copy numbers in microbiome surveys remains an unsolved problem Louca, Stilianos Doebeli, Michael Parfrey, Laura Wegener Microbiome Short Report The 16S ribosomal RNA gene is the most widely used marker gene in microbial ecology. Counts of 16S sequence variants, often in PCR amplicons, are used to estimate proportions of bacterial and archaeal taxa in microbial communities. Because different organisms contain different 16S gene copy numbers (GCNs), sequence variant counts are biased towards clades with greater GCNs. Several tools have recently been developed for predicting GCNs using phylogenetic methods and based on sequenced genomes, in order to correct for these biases. However, the accuracy of those predictions has not been independently assessed. Here, we systematically evaluate the predictability of 16S GCNs across bacterial and archaeal clades, based on ∼ 6,800 public sequenced genomes and using several phylogenetic methods. Further, we assess the accuracy of GCNs predicted by three recently published tools (PICRUSt, CopyRighter, and PAPRICA) over a wide range of taxa and for 635 microbial communities from varied environments. We find that regardless of the phylogenetic method tested, 16S GCNs could only be accurately predicted for a limited fraction of taxa, namely taxa with closely to moderately related representatives (≲15% divergence in the 16S rRNA gene). Consistent with this observation, we find that all considered tools exhibit low predictive accuracy when evaluated against completely sequenced genomes, in some cases explaining less than 10% of the variance. Substantial disagreement was also observed between tools (R(2)<0.5) for the majority of tested microbial communities. The nearest sequenced taxon index (NSTI) of microbial communities, i.e., the average distance to a sequenced genome, was a strong predictor for the agreement between GCN prediction tools on non-animal-associated samples, but only a moderate predictor for animal-associated samples. We recommend against correcting for 16S GCNs in microbiome surveys by default, unless OTUs are sufficiently closely related to sequenced genomes or unless a need for true OTU proportions warrants the additional noise introduced, so that community profiles remain interpretable and comparable between studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40168-018-0420-9) contains supplementary material, which is available to authorized users. BioMed Central 2018-02-26 /pmc/articles/PMC5828423/ /pubmed/29482646 http://dx.doi.org/10.1186/s40168-018-0420-9 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Short Report
Louca, Stilianos
Doebeli, Michael
Parfrey, Laura Wegener
Correcting for 16S rRNA gene copy numbers in microbiome surveys remains an unsolved problem
title Correcting for 16S rRNA gene copy numbers in microbiome surveys remains an unsolved problem
title_full Correcting for 16S rRNA gene copy numbers in microbiome surveys remains an unsolved problem
title_fullStr Correcting for 16S rRNA gene copy numbers in microbiome surveys remains an unsolved problem
title_full_unstemmed Correcting for 16S rRNA gene copy numbers in microbiome surveys remains an unsolved problem
title_short Correcting for 16S rRNA gene copy numbers in microbiome surveys remains an unsolved problem
title_sort correcting for 16s rrna gene copy numbers in microbiome surveys remains an unsolved problem
topic Short Report
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5828423/
https://www.ncbi.nlm.nih.gov/pubmed/29482646
http://dx.doi.org/10.1186/s40168-018-0420-9
work_keys_str_mv AT loucastilianos correctingfor16srrnagenecopynumbersinmicrobiomesurveysremainsanunsolvedproblem
AT doebelimichael correctingfor16srrnagenecopynumbersinmicrobiomesurveysremainsanunsolvedproblem
AT parfreylaurawegener correctingfor16srrnagenecopynumbersinmicrobiomesurveysremainsanunsolvedproblem