Cargando…

Combining 16S rRNA gene variable regions enables high-resolution microbial community profiling

BACKGROUND: Most of our knowledge about the remarkable microbial diversity on Earth comes from sequencing the 16S rRNA gene. The use of next-generation sequencing methods has increased sample number and sequencing depth, but the read length of the most widely used sequencing platforms today is quite...

Descripción completa

Detalles Bibliográficos
Autores principales: Fuks, Garold, Elgart, Michael, Amir, Amnon, Zeisel, Amit, Turnbaugh, Peter J., Soen, Yoav, Shental, Noam
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5787238/
https://www.ncbi.nlm.nih.gov/pubmed/29373999
http://dx.doi.org/10.1186/s40168-017-0396-x
_version_ 1783295893446852608
author Fuks, Garold
Elgart, Michael
Amir, Amnon
Zeisel, Amit
Turnbaugh, Peter J.
Soen, Yoav
Shental, Noam
author_facet Fuks, Garold
Elgart, Michael
Amir, Amnon
Zeisel, Amit
Turnbaugh, Peter J.
Soen, Yoav
Shental, Noam
author_sort Fuks, Garold
collection PubMed
description BACKGROUND: Most of our knowledge about the remarkable microbial diversity on Earth comes from sequencing the 16S rRNA gene. The use of next-generation sequencing methods has increased sample number and sequencing depth, but the read length of the most widely used sequencing platforms today is quite short, requiring the researcher to choose a subset of the gene to sequence (typically 16–33% of the total length). Thus, many bacteria may share the same amplified region, and the resolution of profiling is inherently limited. Platforms that offer ultra-long read lengths, whole genome shotgun sequencing approaches, and computational frameworks formerly suggested by us and by others all allow different ways to circumvent this problem yet suffer various shortcomings. There is a need for a simple and low-cost 16S rRNA gene-based profiling approach that harnesses the short read length to provide a much larger coverage of the gene to allow for high resolution, even in harsh conditions of low bacterial biomass and fragmented DNA. RESULTS: This manuscript suggests Short MUltiple Regions Framework (SMURF), a method to combine sequencing results from different PCR-amplified regions to provide one coherent profiling. The de facto amplicon length is the total length of all amplified regions, thus providing much higher resolution compared to current techniques. Computationally, the method solves a convex optimization problem that allows extremely fast reconstruction and requires only moderate memory. We demonstrate the increase in resolution by in silico simulations and by profiling two mock mixtures and real-world biological samples. Reanalyzing a mock mixture from the Human Microbiome Project achieved about twofold improvement in resolution when combing two independent regions. Using a custom set of six primer pairs spanning about 1200 bp (80%) of the 16S rRNA gene, we were able to achieve ~ 100-fold improvement in resolution compared to a single region, over a mock mixture of common human gut bacterial isolates. Finally, the profiling of a Drosophila melanogaster microbiome using the set of six primer pairs provided a ~ 100-fold increase in resolution and thus enabling efficient downstream analysis. CONCLUSIONS: SMURF enables the identification of near full-length 16S rRNA gene sequences in microbial communities, having resolution superior compared to current techniques. It may be applied to standard sample preparation protocols with very little modifications. SMURF also paves the way to high-resolution profiling of low-biomass and fragmented DNA, e.g., in the case of formalin-fixed and paraffin-embedded samples, fossil-derived DNA, or DNA exposed to other degrading conditions. The approach is not restricted to combining amplicons of the 16S rRNA gene and may be applied to any set of amplicons, e.g., in multilocus sequence typing (MLST). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40168-017-0396-x) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5787238
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-57872382018-02-08 Combining 16S rRNA gene variable regions enables high-resolution microbial community profiling Fuks, Garold Elgart, Michael Amir, Amnon Zeisel, Amit Turnbaugh, Peter J. Soen, Yoav Shental, Noam Microbiome Methodology BACKGROUND: Most of our knowledge about the remarkable microbial diversity on Earth comes from sequencing the 16S rRNA gene. The use of next-generation sequencing methods has increased sample number and sequencing depth, but the read length of the most widely used sequencing platforms today is quite short, requiring the researcher to choose a subset of the gene to sequence (typically 16–33% of the total length). Thus, many bacteria may share the same amplified region, and the resolution of profiling is inherently limited. Platforms that offer ultra-long read lengths, whole genome shotgun sequencing approaches, and computational frameworks formerly suggested by us and by others all allow different ways to circumvent this problem yet suffer various shortcomings. There is a need for a simple and low-cost 16S rRNA gene-based profiling approach that harnesses the short read length to provide a much larger coverage of the gene to allow for high resolution, even in harsh conditions of low bacterial biomass and fragmented DNA. RESULTS: This manuscript suggests Short MUltiple Regions Framework (SMURF), a method to combine sequencing results from different PCR-amplified regions to provide one coherent profiling. The de facto amplicon length is the total length of all amplified regions, thus providing much higher resolution compared to current techniques. Computationally, the method solves a convex optimization problem that allows extremely fast reconstruction and requires only moderate memory. We demonstrate the increase in resolution by in silico simulations and by profiling two mock mixtures and real-world biological samples. Reanalyzing a mock mixture from the Human Microbiome Project achieved about twofold improvement in resolution when combing two independent regions. Using a custom set of six primer pairs spanning about 1200 bp (80%) of the 16S rRNA gene, we were able to achieve ~ 100-fold improvement in resolution compared to a single region, over a mock mixture of common human gut bacterial isolates. Finally, the profiling of a Drosophila melanogaster microbiome using the set of six primer pairs provided a ~ 100-fold increase in resolution and thus enabling efficient downstream analysis. CONCLUSIONS: SMURF enables the identification of near full-length 16S rRNA gene sequences in microbial communities, having resolution superior compared to current techniques. It may be applied to standard sample preparation protocols with very little modifications. SMURF also paves the way to high-resolution profiling of low-biomass and fragmented DNA, e.g., in the case of formalin-fixed and paraffin-embedded samples, fossil-derived DNA, or DNA exposed to other degrading conditions. The approach is not restricted to combining amplicons of the 16S rRNA gene and may be applied to any set of amplicons, e.g., in multilocus sequence typing (MLST). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40168-017-0396-x) contains supplementary material, which is available to authorized users. BioMed Central 2018-01-26 /pmc/articles/PMC5787238/ /pubmed/29373999 http://dx.doi.org/10.1186/s40168-017-0396-x Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology
Fuks, Garold
Elgart, Michael
Amir, Amnon
Zeisel, Amit
Turnbaugh, Peter J.
Soen, Yoav
Shental, Noam
Combining 16S rRNA gene variable regions enables high-resolution microbial community profiling
title Combining 16S rRNA gene variable regions enables high-resolution microbial community profiling
title_full Combining 16S rRNA gene variable regions enables high-resolution microbial community profiling
title_fullStr Combining 16S rRNA gene variable regions enables high-resolution microbial community profiling
title_full_unstemmed Combining 16S rRNA gene variable regions enables high-resolution microbial community profiling
title_short Combining 16S rRNA gene variable regions enables high-resolution microbial community profiling
title_sort combining 16s rrna gene variable regions enables high-resolution microbial community profiling
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5787238/
https://www.ncbi.nlm.nih.gov/pubmed/29373999
http://dx.doi.org/10.1186/s40168-017-0396-x
work_keys_str_mv AT fuksgarold combining16srrnagenevariableregionsenableshighresolutionmicrobialcommunityprofiling
AT elgartmichael combining16srrnagenevariableregionsenableshighresolutionmicrobialcommunityprofiling
AT amiramnon combining16srrnagenevariableregionsenableshighresolutionmicrobialcommunityprofiling
AT zeiselamit combining16srrnagenevariableregionsenableshighresolutionmicrobialcommunityprofiling
AT turnbaughpeterj combining16srrnagenevariableregionsenableshighresolutionmicrobialcommunityprofiling
AT soenyoav combining16srrnagenevariableregionsenableshighresolutionmicrobialcommunityprofiling
AT shentalnoam combining16srrnagenevariableregionsenableshighresolutionmicrobialcommunityprofiling