Cargando…

Comparison of Methods for Picking the Operational Taxonomic Units From Amplicon Sequences

With the advent of next-generation sequencing technology, it has become convenient and cost efficient to thoroughly characterize the microbial diversity and taxonomic composition in various environmental samples. Millions of sequencing data can be generated, and how to utilize this enormous sequence...

Descripción completa

Detalles Bibliográficos
Autores principales: Wei, Ze-Gang, Zhang, Xiao-Dan, Cao, Ming, Liu, Fei, Qian, Yu, Zhang, Shao-Wu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8024490/
https://www.ncbi.nlm.nih.gov/pubmed/33841367
http://dx.doi.org/10.3389/fmicb.2021.644012
_version_ 1783675320587517952
author Wei, Ze-Gang
Zhang, Xiao-Dan
Cao, Ming
Liu, Fei
Qian, Yu
Zhang, Shao-Wu
author_facet Wei, Ze-Gang
Zhang, Xiao-Dan
Cao, Ming
Liu, Fei
Qian, Yu
Zhang, Shao-Wu
author_sort Wei, Ze-Gang
collection PubMed
description With the advent of next-generation sequencing technology, it has become convenient and cost efficient to thoroughly characterize the microbial diversity and taxonomic composition in various environmental samples. Millions of sequencing data can be generated, and how to utilize this enormous sequence resource has become a critical concern for microbial ecologists. One particular challenge is the OTUs (operational taxonomic units) picking in 16S rRNA sequence analysis. Lucky, this challenge can be directly addressed by sequence clustering that attempts to group similar sequences. Therefore, numerous clustering methods have been proposed to help to cluster 16S rRNA sequences into OTUs. However, each method has its clustering mechanism, and different methods produce diverse outputs. Even a slight parameter change for the same method can also generate distinct results, and how to choose an appropriate method has become a challenge for inexperienced users. A lot of time and resources can be wasted in selecting clustering tools and analyzing the clustering results. In this study, we introduced the recent advance of clustering methods for OTUs picking, which mainly focus on three aspects: (i) the principles of existing clustering algorithms, (ii) benchmark dataset construction for OTU picking and evaluation metrics, and (iii) the performance of different methods with various distance thresholds on benchmark datasets. This paper aims to assist biological researchers to select the reasonable clustering methods for analyzing their collected sequences and help algorithm developers to design more efficient sequences clustering methods.
format Online
Article
Text
id pubmed-8024490
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-80244902021-04-08 Comparison of Methods for Picking the Operational Taxonomic Units From Amplicon Sequences Wei, Ze-Gang Zhang, Xiao-Dan Cao, Ming Liu, Fei Qian, Yu Zhang, Shao-Wu Front Microbiol Microbiology With the advent of next-generation sequencing technology, it has become convenient and cost efficient to thoroughly characterize the microbial diversity and taxonomic composition in various environmental samples. Millions of sequencing data can be generated, and how to utilize this enormous sequence resource has become a critical concern for microbial ecologists. One particular challenge is the OTUs (operational taxonomic units) picking in 16S rRNA sequence analysis. Lucky, this challenge can be directly addressed by sequence clustering that attempts to group similar sequences. Therefore, numerous clustering methods have been proposed to help to cluster 16S rRNA sequences into OTUs. However, each method has its clustering mechanism, and different methods produce diverse outputs. Even a slight parameter change for the same method can also generate distinct results, and how to choose an appropriate method has become a challenge for inexperienced users. A lot of time and resources can be wasted in selecting clustering tools and analyzing the clustering results. In this study, we introduced the recent advance of clustering methods for OTUs picking, which mainly focus on three aspects: (i) the principles of existing clustering algorithms, (ii) benchmark dataset construction for OTU picking and evaluation metrics, and (iii) the performance of different methods with various distance thresholds on benchmark datasets. This paper aims to assist biological researchers to select the reasonable clustering methods for analyzing their collected sequences and help algorithm developers to design more efficient sequences clustering methods. Frontiers Media S.A. 2021-03-24 /pmc/articles/PMC8024490/ /pubmed/33841367 http://dx.doi.org/10.3389/fmicb.2021.644012 Text en Copyright © 2021 Wei, Zhang, Cao, Liu, Qian and Zhang. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Wei, Ze-Gang
Zhang, Xiao-Dan
Cao, Ming
Liu, Fei
Qian, Yu
Zhang, Shao-Wu
Comparison of Methods for Picking the Operational Taxonomic Units From Amplicon Sequences
title Comparison of Methods for Picking the Operational Taxonomic Units From Amplicon Sequences
title_full Comparison of Methods for Picking the Operational Taxonomic Units From Amplicon Sequences
title_fullStr Comparison of Methods for Picking the Operational Taxonomic Units From Amplicon Sequences
title_full_unstemmed Comparison of Methods for Picking the Operational Taxonomic Units From Amplicon Sequences
title_short Comparison of Methods for Picking the Operational Taxonomic Units From Amplicon Sequences
title_sort comparison of methods for picking the operational taxonomic units from amplicon sequences
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8024490/
https://www.ncbi.nlm.nih.gov/pubmed/33841367
http://dx.doi.org/10.3389/fmicb.2021.644012
work_keys_str_mv AT weizegang comparisonofmethodsforpickingtheoperationaltaxonomicunitsfromampliconsequences
AT zhangxiaodan comparisonofmethodsforpickingtheoperationaltaxonomicunitsfromampliconsequences
AT caoming comparisonofmethodsforpickingtheoperationaltaxonomicunitsfromampliconsequences
AT liufei comparisonofmethodsforpickingtheoperationaltaxonomicunitsfromampliconsequences
AT qianyu comparisonofmethodsforpickingtheoperationaltaxonomicunitsfromampliconsequences
AT zhangshaowu comparisonofmethodsforpickingtheoperationaltaxonomicunitsfromampliconsequences