Cargando…

Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering

BACKGROUND: High-throughput bacterial 16S rRNA gene sequencing followed by clustering of short sequences into operational taxonomic units (OTUs) is widely used for microbiome profiling. However, clustering of short 16S rRNA gene reads into biologically meaningful OTUs is challenging, in part because...

Descripción completa

Detalles Bibliográficos
Autores principales: Franzén, Oscar, Hu, Jianzhong, Bao, Xiuliang, Itzkowitz, Steven H., Peter, Inga, Bashir, Ali
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4593230/
https://www.ncbi.nlm.nih.gov/pubmed/26434730
http://dx.doi.org/10.1186/s40168-015-0105-6
_version_ 1782393299715227648
author Franzén, Oscar
Hu, Jianzhong
Bao, Xiuliang
Itzkowitz, Steven H.
Peter, Inga
Bashir, Ali
author_facet Franzén, Oscar
Hu, Jianzhong
Bao, Xiuliang
Itzkowitz, Steven H.
Peter, Inga
Bashir, Ali
author_sort Franzén, Oscar
collection PubMed
description BACKGROUND: High-throughput bacterial 16S rRNA gene sequencing followed by clustering of short sequences into operational taxonomic units (OTUs) is widely used for microbiome profiling. However, clustering of short 16S rRNA gene reads into biologically meaningful OTUs is challenging, in part because nucleotide variation along the 16S rRNA gene is only partially captured by short reads. The recent emergence of long-read platforms, such as single-molecule real-time (SMRT) sequencing from Pacific Biosciences, offers the potential for improved taxonomic and phylogenetic profiling. Here, we evaluate the performance of long- and short-read 16S rRNA gene sequencing using simulated and experimental data, followed by OTU inference using computational pipelines based on heuristic and complete-linkage hierarchical clustering. RESULTS: In simulated data, long-read sequencing was shown to improve OTU quality and decrease variance. We then profiled 40 human gut microbiome samples using a combination of Illumina MiSeq and Blautia-specific SMRT sequencing, further supporting the notion that long reads can identify additional OTUs. We implemented a complete-linkage hierarchical clustering strategy using a flexible computational pipeline, tailored specifically for PacBio circular consensus sequencing (CCS) data that outperforms heuristic methods in most settings: https://github.com/oscar-franzen/oclust/. CONCLUSION: Our data demonstrate that long reads can improve OTU inference; however, the choice of clustering algorithm and associated clustering thresholds has significant impact on performance. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s40168-015-0105-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4593230
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-45932302015-10-06 Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering Franzén, Oscar Hu, Jianzhong Bao, Xiuliang Itzkowitz, Steven H. Peter, Inga Bashir, Ali Microbiome Methodology BACKGROUND: High-throughput bacterial 16S rRNA gene sequencing followed by clustering of short sequences into operational taxonomic units (OTUs) is widely used for microbiome profiling. However, clustering of short 16S rRNA gene reads into biologically meaningful OTUs is challenging, in part because nucleotide variation along the 16S rRNA gene is only partially captured by short reads. The recent emergence of long-read platforms, such as single-molecule real-time (SMRT) sequencing from Pacific Biosciences, offers the potential for improved taxonomic and phylogenetic profiling. Here, we evaluate the performance of long- and short-read 16S rRNA gene sequencing using simulated and experimental data, followed by OTU inference using computational pipelines based on heuristic and complete-linkage hierarchical clustering. RESULTS: In simulated data, long-read sequencing was shown to improve OTU quality and decrease variance. We then profiled 40 human gut microbiome samples using a combination of Illumina MiSeq and Blautia-specific SMRT sequencing, further supporting the notion that long reads can identify additional OTUs. We implemented a complete-linkage hierarchical clustering strategy using a flexible computational pipeline, tailored specifically for PacBio circular consensus sequencing (CCS) data that outperforms heuristic methods in most settings: https://github.com/oscar-franzen/oclust/. CONCLUSION: Our data demonstrate that long reads can improve OTU inference; however, the choice of clustering algorithm and associated clustering thresholds has significant impact on performance. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s40168-015-0105-6) contains supplementary material, which is available to authorized users. BioMed Central 2015-10-05 /pmc/articles/PMC4593230/ /pubmed/26434730 http://dx.doi.org/10.1186/s40168-015-0105-6 Text en © Franzén et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology
Franzén, Oscar
Hu, Jianzhong
Bao, Xiuliang
Itzkowitz, Steven H.
Peter, Inga
Bashir, Ali
Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering
title Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering
title_full Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering
title_fullStr Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering
title_full_unstemmed Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering
title_short Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering
title_sort improved otu-picking using long-read 16s rrna gene amplicon sequencing and generic hierarchical clustering
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4593230/
https://www.ncbi.nlm.nih.gov/pubmed/26434730
http://dx.doi.org/10.1186/s40168-015-0105-6
work_keys_str_mv AT franzenoscar improvedotupickingusinglongread16srrnageneampliconsequencingandgenerichierarchicalclustering
AT hujianzhong improvedotupickingusinglongread16srrnageneampliconsequencingandgenerichierarchicalclustering
AT baoxiuliang improvedotupickingusinglongread16srrnageneampliconsequencingandgenerichierarchicalclustering
AT itzkowitzstevenh improvedotupickingusinglongread16srrnageneampliconsequencingandgenerichierarchicalclustering
AT peteringa improvedotupickingusinglongread16srrnageneampliconsequencingandgenerichierarchicalclustering
AT bashirali improvedotupickingusinglongread16srrnageneampliconsequencingandgenerichierarchicalclustering