Cargando…

A streamlined pipeline based on HmmUFOtu for microbial community profiling using 16S rRNA amplicon sequencing

Microbial community profiling using 16S rRNA amplicon sequencing allows for taxonomic characterization of diverse microorganisms. While amplicon sequence variant (ASV) methods are increasingly favored for their fine-grained resolution of sequence variants, they often discard substantial portions of...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kim, Hyeonwoo, Kim, Jiwon, Choi, Ji Won, Ahn, Kwang-Sung, Park, Dong-Il, Kim, Sangsoo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Korea Genome Organization 2023
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10584646/ https://www.ncbi.nlm.nih.gov/pubmed/37813636 http://dx.doi.org/10.5808/gi.23044

_version_	1785122783229902848
author	Kim, Hyeonwoo Kim, Jiwon Choi, Ji Won Ahn, Kwang-Sung Park, Dong-Il Kim, Sangsoo
author_facet	Kim, Hyeonwoo Kim, Jiwon Choi, Ji Won Ahn, Kwang-Sung Park, Dong-Il Kim, Sangsoo
author_sort	Kim, Hyeonwoo
collection	PubMed
description	Microbial community profiling using 16S rRNA amplicon sequencing allows for taxonomic characterization of diverse microorganisms. While amplicon sequence variant (ASV) methods are increasingly favored for their fine-grained resolution of sequence variants, they often discard substantial portions of sequencing reads during quality control, particularly in datasets with large number samples. We present a streamlined pipeline that integrates FastP for read trimming, HmmUFOtu for operational taxonomic units (OTU) clustering, Vsearch for chimera checking, and Kraken2 for taxonomic assignment. To assess the pipeline’s performance, we reprocessed two published stool datasets of normal Korean populations: one with 890 and the other with 1,462 independent samples. In the first dataset, HmmUFOtu retained 93.2% of over 104 million read pairs after quality trimming, discarding chimeric or unclassifiable reads, while DADA2, a commonly used ASV method, retained only 44.6% of the reads. Nonetheless, both methods yielded qualitatively similar β-diversity plots. For the second dataset, HmmUFOtu retained 89.2% of read pairs, while DADA2 retained a mere 18.4% of the reads. HmmUFOtu, being a closed-reference clustering method, facilitates merging separately processed datasets, with shared OTUs between the two datasets exhibiting a correlation coefficient of 0.92 in total abundance (log scale). While the first two dimensions of the β-diversity plot exhibited a cohesive mixture of the two datasets, the third dimension revealed the presence of a batch effect. Our comparative evaluation of ASV and OTU methods within this streamlined pipeline provides valuable insights into their performance when processing large-scale microbial 16S rRNA amplicon sequencing data. The strengths of HmmUFOtu and its potential for dataset merging are highlighted.
format	Online Article Text
id	pubmed-10584646
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Korea Genome Organization
record_format	MEDLINE/PubMed
spelling	pubmed-105846462023-10-20 A streamlined pipeline based on HmmUFOtu for microbial community profiling using 16S rRNA amplicon sequencing Kim, Hyeonwoo Kim, Jiwon Choi, Ji Won Ahn, Kwang-Sung Park, Dong-Il Kim, Sangsoo Genomics Inform Original Article Microbial community profiling using 16S rRNA amplicon sequencing allows for taxonomic characterization of diverse microorganisms. While amplicon sequence variant (ASV) methods are increasingly favored for their fine-grained resolution of sequence variants, they often discard substantial portions of sequencing reads during quality control, particularly in datasets with large number samples. We present a streamlined pipeline that integrates FastP for read trimming, HmmUFOtu for operational taxonomic units (OTU) clustering, Vsearch for chimera checking, and Kraken2 for taxonomic assignment. To assess the pipeline’s performance, we reprocessed two published stool datasets of normal Korean populations: one with 890 and the other with 1,462 independent samples. In the first dataset, HmmUFOtu retained 93.2% of over 104 million read pairs after quality trimming, discarding chimeric or unclassifiable reads, while DADA2, a commonly used ASV method, retained only 44.6% of the reads. Nonetheless, both methods yielded qualitatively similar β-diversity plots. For the second dataset, HmmUFOtu retained 89.2% of read pairs, while DADA2 retained a mere 18.4% of the reads. HmmUFOtu, being a closed-reference clustering method, facilitates merging separately processed datasets, with shared OTUs between the two datasets exhibiting a correlation coefficient of 0.92 in total abundance (log scale). While the first two dimensions of the β-diversity plot exhibited a cohesive mixture of the two datasets, the third dimension revealed the presence of a batch effect. Our comparative evaluation of ASV and OTU methods within this streamlined pipeline provides valuable insights into their performance when processing large-scale microbial 16S rRNA amplicon sequencing data. The strengths of HmmUFOtu and its potential for dataset merging are highlighted. Korea Genome Organization 2023-07-31 /pmc/articles/PMC10584646/ /pubmed/37813636 http://dx.doi.org/10.5808/gi.23044 Text en (c) 2023, Korea Genome Organization https://creativecommons.org/licenses/by/4.0/(CC) This is an open-access article distributed under the terms of the Creative Commons Attribution license(https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Article Kim, Hyeonwoo Kim, Jiwon Choi, Ji Won Ahn, Kwang-Sung Park, Dong-Il Kim, Sangsoo A streamlined pipeline based on HmmUFOtu for microbial community profiling using 16S rRNA amplicon sequencing
title	A streamlined pipeline based on HmmUFOtu for microbial community profiling using 16S rRNA amplicon sequencing
title_full	A streamlined pipeline based on HmmUFOtu for microbial community profiling using 16S rRNA amplicon sequencing
title_fullStr	A streamlined pipeline based on HmmUFOtu for microbial community profiling using 16S rRNA amplicon sequencing
title_full_unstemmed	A streamlined pipeline based on HmmUFOtu for microbial community profiling using 16S rRNA amplicon sequencing
title_short	A streamlined pipeline based on HmmUFOtu for microbial community profiling using 16S rRNA amplicon sequencing
title_sort	streamlined pipeline based on hmmufotu for microbial community profiling using 16s rrna amplicon sequencing
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10584646/ https://www.ncbi.nlm.nih.gov/pubmed/37813636 http://dx.doi.org/10.5808/gi.23044
work_keys_str_mv	AT kimhyeonwoo astreamlinedpipelinebasedonhmmufotuformicrobialcommunityprofilingusing16srrnaampliconsequencing AT kimjiwon astreamlinedpipelinebasedonhmmufotuformicrobialcommunityprofilingusing16srrnaampliconsequencing AT choijiwon astreamlinedpipelinebasedonhmmufotuformicrobialcommunityprofilingusing16srrnaampliconsequencing AT ahnkwangsung astreamlinedpipelinebasedonhmmufotuformicrobialcommunityprofilingusing16srrnaampliconsequencing AT parkdongil astreamlinedpipelinebasedonhmmufotuformicrobialcommunityprofilingusing16srrnaampliconsequencing AT kimsangsoo astreamlinedpipelinebasedonhmmufotuformicrobialcommunityprofilingusing16srrnaampliconsequencing AT kimhyeonwoo streamlinedpipelinebasedonhmmufotuformicrobialcommunityprofilingusing16srrnaampliconsequencing AT kimjiwon streamlinedpipelinebasedonhmmufotuformicrobialcommunityprofilingusing16srrnaampliconsequencing AT choijiwon streamlinedpipelinebasedonhmmufotuformicrobialcommunityprofilingusing16srrnaampliconsequencing AT ahnkwangsung streamlinedpipelinebasedonhmmufotuformicrobialcommunityprofilingusing16srrnaampliconsequencing AT parkdongil streamlinedpipelinebasedonhmmufotuformicrobialcommunityprofilingusing16srrnaampliconsequencing AT kimsangsoo streamlinedpipelinebasedonhmmufotuformicrobialcommunityprofilingusing16srrnaampliconsequencing

A streamlined pipeline based on HmmUFOtu for microbial community profiling using 16S rRNA amplicon sequencing

Ejemplares similares