Cargando…

An independent evaluation in a CRC patient cohort of microbiome 16S rRNA sequence analysis methods: OTU clustering, DADA2, and Deblur

16S rRNA is the universal gene of microbes, and it is often used as a target gene to obtain profiles of microbial communities via next-generation sequencing (NGS) technology. Traditionally, sequences are clustered into operational taxonomic units (OTUs) at a 97% threshold based on the taxonomic stan...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Guang, Li, Tong, Zhu, Xiaoyan, Zhang, Xuanping, Wang, Jiayin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10408458/
https://www.ncbi.nlm.nih.gov/pubmed/37560524
http://dx.doi.org/10.3389/fmicb.2023.1178744
_version_ 1785086186787700736
author Liu, Guang
Li, Tong
Zhu, Xiaoyan
Zhang, Xuanping
Wang, Jiayin
author_facet Liu, Guang
Li, Tong
Zhu, Xiaoyan
Zhang, Xuanping
Wang, Jiayin
author_sort Liu, Guang
collection PubMed
description 16S rRNA is the universal gene of microbes, and it is often used as a target gene to obtain profiles of microbial communities via next-generation sequencing (NGS) technology. Traditionally, sequences are clustered into operational taxonomic units (OTUs) at a 97% threshold based on the taxonomic standard using 16S rRNA, and methods for the reduction of sequencing errors are bypassed, which may lead to false classification units. Several denoising algorithms have been published to solve this problem, such as DADA2 and Deblur, which can correct sequencing errors at single-nucleotide resolution by generating amplicon sequence variants (ASVs). As high-resolution ASVs are becoming more popular than OTUs and only one analysis method is usually selected in a particular study, there is a need for a thorough comparison of OTU clustering and denoising pipelines. In this study, three of the most widely used 16S rRNA methods (two denoising algorithms, DADA2 and Deblur, along with de novo OTU clustering) were thoroughly compared using 16S rRNA amplification sequencing data generated from 358 clinical stool samples from the Colorectal Cancer (CRC) Screening Cohort. Our findings indicated that all approaches led to similar taxonomic profiles (with P > 0.05 in PERMNAOVA and P <0.001 in the Mantel test), although the number of ASVs/OTUs and the alpha-diversity indices varied considerably. Despite considerable differences in disease-related markers identified, disease-related analysis showed that all methods could result in similar conclusions. Fusobacterium, Streptococcus, Peptostreptococcus, Parvimonas, Gemella, and Haemophilus were identified by all three methods as enriched in the CRC group, while Roseburia, Faecalibacterium, Butyricicoccus, and Blautia were identified by all three methods as enriched in the healthy group. In addition, disease-diagnostic models generated using machine learning algorithms based on the data from these different methods all achieved good diagnostic efficiency (AUC: 0.87–0.89), with the model based on DADA2 producing the highest AUC (0.8944 and 0.8907 in the training set and test set, respectively). However, there was no significant difference in performance between the models (P >0.05). In conclusion, this study demonstrates that DADA2, Deblur, and de novo OTU clustering display similar power levels in taxa assignment and can produce similar conclusions in the case of the CRC cohort.
format Online
Article
Text
id pubmed-10408458
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-104084582023-08-09 An independent evaluation in a CRC patient cohort of microbiome 16S rRNA sequence analysis methods: OTU clustering, DADA2, and Deblur Liu, Guang Li, Tong Zhu, Xiaoyan Zhang, Xuanping Wang, Jiayin Front Microbiol Microbiology 16S rRNA is the universal gene of microbes, and it is often used as a target gene to obtain profiles of microbial communities via next-generation sequencing (NGS) technology. Traditionally, sequences are clustered into operational taxonomic units (OTUs) at a 97% threshold based on the taxonomic standard using 16S rRNA, and methods for the reduction of sequencing errors are bypassed, which may lead to false classification units. Several denoising algorithms have been published to solve this problem, such as DADA2 and Deblur, which can correct sequencing errors at single-nucleotide resolution by generating amplicon sequence variants (ASVs). As high-resolution ASVs are becoming more popular than OTUs and only one analysis method is usually selected in a particular study, there is a need for a thorough comparison of OTU clustering and denoising pipelines. In this study, three of the most widely used 16S rRNA methods (two denoising algorithms, DADA2 and Deblur, along with de novo OTU clustering) were thoroughly compared using 16S rRNA amplification sequencing data generated from 358 clinical stool samples from the Colorectal Cancer (CRC) Screening Cohort. Our findings indicated that all approaches led to similar taxonomic profiles (with P > 0.05 in PERMNAOVA and P <0.001 in the Mantel test), although the number of ASVs/OTUs and the alpha-diversity indices varied considerably. Despite considerable differences in disease-related markers identified, disease-related analysis showed that all methods could result in similar conclusions. Fusobacterium, Streptococcus, Peptostreptococcus, Parvimonas, Gemella, and Haemophilus were identified by all three methods as enriched in the CRC group, while Roseburia, Faecalibacterium, Butyricicoccus, and Blautia were identified by all three methods as enriched in the healthy group. In addition, disease-diagnostic models generated using machine learning algorithms based on the data from these different methods all achieved good diagnostic efficiency (AUC: 0.87–0.89), with the model based on DADA2 producing the highest AUC (0.8944 and 0.8907 in the training set and test set, respectively). However, there was no significant difference in performance between the models (P >0.05). In conclusion, this study demonstrates that DADA2, Deblur, and de novo OTU clustering display similar power levels in taxa assignment and can produce similar conclusions in the case of the CRC cohort. Frontiers Media S.A. 2023-07-25 /pmc/articles/PMC10408458/ /pubmed/37560524 http://dx.doi.org/10.3389/fmicb.2023.1178744 Text en Copyright © 2023 Liu, Li, Zhu, Zhang and Wang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Liu, Guang
Li, Tong
Zhu, Xiaoyan
Zhang, Xuanping
Wang, Jiayin
An independent evaluation in a CRC patient cohort of microbiome 16S rRNA sequence analysis methods: OTU clustering, DADA2, and Deblur
title An independent evaluation in a CRC patient cohort of microbiome 16S rRNA sequence analysis methods: OTU clustering, DADA2, and Deblur
title_full An independent evaluation in a CRC patient cohort of microbiome 16S rRNA sequence analysis methods: OTU clustering, DADA2, and Deblur
title_fullStr An independent evaluation in a CRC patient cohort of microbiome 16S rRNA sequence analysis methods: OTU clustering, DADA2, and Deblur
title_full_unstemmed An independent evaluation in a CRC patient cohort of microbiome 16S rRNA sequence analysis methods: OTU clustering, DADA2, and Deblur
title_short An independent evaluation in a CRC patient cohort of microbiome 16S rRNA sequence analysis methods: OTU clustering, DADA2, and Deblur
title_sort independent evaluation in a crc patient cohort of microbiome 16s rrna sequence analysis methods: otu clustering, dada2, and deblur
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10408458/
https://www.ncbi.nlm.nih.gov/pubmed/37560524
http://dx.doi.org/10.3389/fmicb.2023.1178744
work_keys_str_mv AT liuguang anindependentevaluationinacrcpatientcohortofmicrobiome16srrnasequenceanalysismethodsotuclusteringdada2anddeblur
AT litong anindependentevaluationinacrcpatientcohortofmicrobiome16srrnasequenceanalysismethodsotuclusteringdada2anddeblur
AT zhuxiaoyan anindependentevaluationinacrcpatientcohortofmicrobiome16srrnasequenceanalysismethodsotuclusteringdada2anddeblur
AT zhangxuanping anindependentevaluationinacrcpatientcohortofmicrobiome16srrnasequenceanalysismethodsotuclusteringdada2anddeblur
AT wangjiayin anindependentevaluationinacrcpatientcohortofmicrobiome16srrnasequenceanalysismethodsotuclusteringdada2anddeblur
AT liuguang independentevaluationinacrcpatientcohortofmicrobiome16srrnasequenceanalysismethodsotuclusteringdada2anddeblur
AT litong independentevaluationinacrcpatientcohortofmicrobiome16srrnasequenceanalysismethodsotuclusteringdada2anddeblur
AT zhuxiaoyan independentevaluationinacrcpatientcohortofmicrobiome16srrnasequenceanalysismethodsotuclusteringdada2anddeblur
AT zhangxuanping independentevaluationinacrcpatientcohortofmicrobiome16srrnasequenceanalysismethodsotuclusteringdada2anddeblur
AT wangjiayin independentevaluationinacrcpatientcohortofmicrobiome16srrnasequenceanalysismethodsotuclusteringdada2anddeblur