Cargando…

A Comparison of Methods for Clustering 16S rRNA Sequences into OTUs

Recent studies of 16S rRNA sequences through next-generation sequencing have revolutionized our understanding of the microbial community composition and structure. One common approach in using these data to explore the genetic diversity in a microbial community is to cluster the 16S rRNA sequences i...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Wei, Zhang, Clarence K., Cheng, Yongmei, Zhang, Shaowu, Zhao, Hongyu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3742672/
https://www.ncbi.nlm.nih.gov/pubmed/23967117
http://dx.doi.org/10.1371/journal.pone.0070837
_version_ 1782280397845954560
author Chen, Wei
Zhang, Clarence K.
Cheng, Yongmei
Zhang, Shaowu
Zhao, Hongyu
author_facet Chen, Wei
Zhang, Clarence K.
Cheng, Yongmei
Zhang, Shaowu
Zhao, Hongyu
author_sort Chen, Wei
collection PubMed
description Recent studies of 16S rRNA sequences through next-generation sequencing have revolutionized our understanding of the microbial community composition and structure. One common approach in using these data to explore the genetic diversity in a microbial community is to cluster the 16S rRNA sequences into Operational Taxonomic Units (OTUs) based on sequence similarities. The inferred OTUs can then be used to estimate species, diversity, composition, and richness. Although a number of methods have been developed and commonly used to cluster the sequences into OTUs, relatively little guidance is available on their relative performance and the choice of key parameters for each method. In this study, we conducted a comprehensive evaluation of ten existing OTU inference methods. We found that the appropriate dissimilarity value for defining distinct OTUs is not only related with a specific method but also related with the sample complexity. For data sets with low complexity, all the algorithms need a higher dissimilarity threshold to define OTUs. Some methods, such as, CROP and SLP, are more robust to the specific choice of the threshold than other methods, especially for shorter reads. For high-complexity data sets, hierarchical cluster methods need a more strict dissimilarity threshold to define OTUs because the commonly used dissimilarity threshold of 3% often leads to an under-estimation of the number of OTUs. In general, hierarchical clustering methods perform better at lower dissimilarity thresholds. Our results show that sequence abundance plays an important role in OTU inference. We conclude that care is needed to choose both a threshold for dissimilarity and abundance for OTU inference.
format Online
Article
Text
id pubmed-3742672
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-37426722013-08-21 A Comparison of Methods for Clustering 16S rRNA Sequences into OTUs Chen, Wei Zhang, Clarence K. Cheng, Yongmei Zhang, Shaowu Zhao, Hongyu PLoS One Research Article Recent studies of 16S rRNA sequences through next-generation sequencing have revolutionized our understanding of the microbial community composition and structure. One common approach in using these data to explore the genetic diversity in a microbial community is to cluster the 16S rRNA sequences into Operational Taxonomic Units (OTUs) based on sequence similarities. The inferred OTUs can then be used to estimate species, diversity, composition, and richness. Although a number of methods have been developed and commonly used to cluster the sequences into OTUs, relatively little guidance is available on their relative performance and the choice of key parameters for each method. In this study, we conducted a comprehensive evaluation of ten existing OTU inference methods. We found that the appropriate dissimilarity value for defining distinct OTUs is not only related with a specific method but also related with the sample complexity. For data sets with low complexity, all the algorithms need a higher dissimilarity threshold to define OTUs. Some methods, such as, CROP and SLP, are more robust to the specific choice of the threshold than other methods, especially for shorter reads. For high-complexity data sets, hierarchical cluster methods need a more strict dissimilarity threshold to define OTUs because the commonly used dissimilarity threshold of 3% often leads to an under-estimation of the number of OTUs. In general, hierarchical clustering methods perform better at lower dissimilarity thresholds. Our results show that sequence abundance plays an important role in OTU inference. We conclude that care is needed to choose both a threshold for dissimilarity and abundance for OTU inference. Public Library of Science 2013-08-13 /pmc/articles/PMC3742672/ /pubmed/23967117 http://dx.doi.org/10.1371/journal.pone.0070837 Text en © 2013 Chen et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Chen, Wei
Zhang, Clarence K.
Cheng, Yongmei
Zhang, Shaowu
Zhao, Hongyu
A Comparison of Methods for Clustering 16S rRNA Sequences into OTUs
title A Comparison of Methods for Clustering 16S rRNA Sequences into OTUs
title_full A Comparison of Methods for Clustering 16S rRNA Sequences into OTUs
title_fullStr A Comparison of Methods for Clustering 16S rRNA Sequences into OTUs
title_full_unstemmed A Comparison of Methods for Clustering 16S rRNA Sequences into OTUs
title_short A Comparison of Methods for Clustering 16S rRNA Sequences into OTUs
title_sort comparison of methods for clustering 16s rrna sequences into otus
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3742672/
https://www.ncbi.nlm.nih.gov/pubmed/23967117
http://dx.doi.org/10.1371/journal.pone.0070837
work_keys_str_mv AT chenwei acomparisonofmethodsforclustering16srrnasequencesintootus
AT zhangclarencek acomparisonofmethodsforclustering16srrnasequencesintootus
AT chengyongmei acomparisonofmethodsforclustering16srrnasequencesintootus
AT zhangshaowu acomparisonofmethodsforclustering16srrnasequencesintootus
AT zhaohongyu acomparisonofmethodsforclustering16srrnasequencesintootus
AT chenwei comparisonofmethodsforclustering16srrnasequencesintootus
AT zhangclarencek comparisonofmethodsforclustering16srrnasequencesintootus
AT chengyongmei comparisonofmethodsforclustering16srrnasequencesintootus
AT zhangshaowu comparisonofmethodsforclustering16srrnasequencesintootus
AT zhaohongyu comparisonofmethodsforclustering16srrnasequencesintootus