Cargando…

A streamlined pipeline based on HmmUFOtu for microbial community profiling using 16S rRNA amplicon sequencing

Microbial community profiling using 16S rRNA amplicon sequencing allows for taxonomic characterization of diverse microorganisms. While amplicon sequence variant (ASV) methods are increasingly favored for their fine-grained resolution of sequence variants, they often discard substantial portions of...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Hyeonwoo, Kim, Jiwon, Choi, Ji Won, Ahn, Kwang-Sung, Park, Dong-Il, Kim, Sangsoo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Korea Genome Organization 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10584646/
https://www.ncbi.nlm.nih.gov/pubmed/37813636
http://dx.doi.org/10.5808/gi.23044
_version_ 1785122783229902848
author Kim, Hyeonwoo
Kim, Jiwon
Choi, Ji Won
Ahn, Kwang-Sung
Park, Dong-Il
Kim, Sangsoo
author_facet Kim, Hyeonwoo
Kim, Jiwon
Choi, Ji Won
Ahn, Kwang-Sung
Park, Dong-Il
Kim, Sangsoo
author_sort Kim, Hyeonwoo
collection PubMed
description Microbial community profiling using 16S rRNA amplicon sequencing allows for taxonomic characterization of diverse microorganisms. While amplicon sequence variant (ASV) methods are increasingly favored for their fine-grained resolution of sequence variants, they often discard substantial portions of sequencing reads during quality control, particularly in datasets with large number samples. We present a streamlined pipeline that integrates FastP for read trimming, HmmUFOtu for operational taxonomic units (OTU) clustering, Vsearch for chimera checking, and Kraken2 for taxonomic assignment. To assess the pipeline’s performance, we reprocessed two published stool datasets of normal Korean populations: one with 890 and the other with 1,462 independent samples. In the first dataset, HmmUFOtu retained 93.2% of over 104 million read pairs after quality trimming, discarding chimeric or unclassifiable reads, while DADA2, a commonly used ASV method, retained only 44.6% of the reads. Nonetheless, both methods yielded qualitatively similar β-diversity plots. For the second dataset, HmmUFOtu retained 89.2% of read pairs, while DADA2 retained a mere 18.4% of the reads. HmmUFOtu, being a closed-reference clustering method, facilitates merging separately processed datasets, with shared OTUs between the two datasets exhibiting a correlation coefficient of 0.92 in total abundance (log scale). While the first two dimensions of the β-diversity plot exhibited a cohesive mixture of the two datasets, the third dimension revealed the presence of a batch effect. Our comparative evaluation of ASV and OTU methods within this streamlined pipeline provides valuable insights into their performance when processing large-scale microbial 16S rRNA amplicon sequencing data. The strengths of HmmUFOtu and its potential for dataset merging are highlighted.
format Online
Article
Text
id pubmed-10584646
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Korea Genome Organization
record_format MEDLINE/PubMed
spelling pubmed-105846462023-10-20 A streamlined pipeline based on HmmUFOtu for microbial community profiling using 16S rRNA amplicon sequencing Kim, Hyeonwoo Kim, Jiwon Choi, Ji Won Ahn, Kwang-Sung Park, Dong-Il Kim, Sangsoo Genomics Inform Original Article Microbial community profiling using 16S rRNA amplicon sequencing allows for taxonomic characterization of diverse microorganisms. While amplicon sequence variant (ASV) methods are increasingly favored for their fine-grained resolution of sequence variants, they often discard substantial portions of sequencing reads during quality control, particularly in datasets with large number samples. We present a streamlined pipeline that integrates FastP for read trimming, HmmUFOtu for operational taxonomic units (OTU) clustering, Vsearch for chimera checking, and Kraken2 for taxonomic assignment. To assess the pipeline’s performance, we reprocessed two published stool datasets of normal Korean populations: one with 890 and the other with 1,462 independent samples. In the first dataset, HmmUFOtu retained 93.2% of over 104 million read pairs after quality trimming, discarding chimeric or unclassifiable reads, while DADA2, a commonly used ASV method, retained only 44.6% of the reads. Nonetheless, both methods yielded qualitatively similar β-diversity plots. For the second dataset, HmmUFOtu retained 89.2% of read pairs, while DADA2 retained a mere 18.4% of the reads. HmmUFOtu, being a closed-reference clustering method, facilitates merging separately processed datasets, with shared OTUs between the two datasets exhibiting a correlation coefficient of 0.92 in total abundance (log scale). While the first two dimensions of the β-diversity plot exhibited a cohesive mixture of the two datasets, the third dimension revealed the presence of a batch effect. Our comparative evaluation of ASV and OTU methods within this streamlined pipeline provides valuable insights into their performance when processing large-scale microbial 16S rRNA amplicon sequencing data. The strengths of HmmUFOtu and its potential for dataset merging are highlighted. Korea Genome Organization 2023-07-31 /pmc/articles/PMC10584646/ /pubmed/37813636 http://dx.doi.org/10.5808/gi.23044 Text en (c) 2023, Korea Genome Organization https://creativecommons.org/licenses/by/4.0/(CC) This is an open-access article distributed under the terms of the Creative Commons Attribution license(https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Kim, Hyeonwoo
Kim, Jiwon
Choi, Ji Won
Ahn, Kwang-Sung
Park, Dong-Il
Kim, Sangsoo
A streamlined pipeline based on HmmUFOtu for microbial community profiling using 16S rRNA amplicon sequencing
title A streamlined pipeline based on HmmUFOtu for microbial community profiling using 16S rRNA amplicon sequencing
title_full A streamlined pipeline based on HmmUFOtu for microbial community profiling using 16S rRNA amplicon sequencing
title_fullStr A streamlined pipeline based on HmmUFOtu for microbial community profiling using 16S rRNA amplicon sequencing
title_full_unstemmed A streamlined pipeline based on HmmUFOtu for microbial community profiling using 16S rRNA amplicon sequencing
title_short A streamlined pipeline based on HmmUFOtu for microbial community profiling using 16S rRNA amplicon sequencing
title_sort streamlined pipeline based on hmmufotu for microbial community profiling using 16s rrna amplicon sequencing
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10584646/
https://www.ncbi.nlm.nih.gov/pubmed/37813636
http://dx.doi.org/10.5808/gi.23044
work_keys_str_mv AT kimhyeonwoo astreamlinedpipelinebasedonhmmufotuformicrobialcommunityprofilingusing16srrnaampliconsequencing
AT kimjiwon astreamlinedpipelinebasedonhmmufotuformicrobialcommunityprofilingusing16srrnaampliconsequencing
AT choijiwon astreamlinedpipelinebasedonhmmufotuformicrobialcommunityprofilingusing16srrnaampliconsequencing
AT ahnkwangsung astreamlinedpipelinebasedonhmmufotuformicrobialcommunityprofilingusing16srrnaampliconsequencing
AT parkdongil astreamlinedpipelinebasedonhmmufotuformicrobialcommunityprofilingusing16srrnaampliconsequencing
AT kimsangsoo astreamlinedpipelinebasedonhmmufotuformicrobialcommunityprofilingusing16srrnaampliconsequencing
AT kimhyeonwoo streamlinedpipelinebasedonhmmufotuformicrobialcommunityprofilingusing16srrnaampliconsequencing
AT kimjiwon streamlinedpipelinebasedonhmmufotuformicrobialcommunityprofilingusing16srrnaampliconsequencing
AT choijiwon streamlinedpipelinebasedonhmmufotuformicrobialcommunityprofilingusing16srrnaampliconsequencing
AT ahnkwangsung streamlinedpipelinebasedonhmmufotuformicrobialcommunityprofilingusing16srrnaampliconsequencing
AT parkdongil streamlinedpipelinebasedonhmmufotuformicrobialcommunityprofilingusing16srrnaampliconsequencing
AT kimsangsoo streamlinedpipelinebasedonhmmufotuformicrobialcommunityprofilingusing16srrnaampliconsequencing