Cargando…

A Comparison of Methods for Clustering 16S rRNA Sequences into OTUs

Recent studies of 16S rRNA sequences through next-generation sequencing have revolutionized our understanding of the microbial community composition and structure. One common approach in using these data to explore the genetic diversity in a microbial community is to cluster the 16S rRNA sequences i...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chen, Wei, Zhang, Clarence K., Cheng, Yongmei, Zhang, Shaowu, Zhao, Hongyu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2013
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3742672/ https://www.ncbi.nlm.nih.gov/pubmed/23967117 http://dx.doi.org/10.1371/journal.pone.0070837

_version_	1782280397845954560
author	Chen, Wei Zhang, Clarence K. Cheng, Yongmei Zhang, Shaowu Zhao, Hongyu
author_facet	Chen, Wei Zhang, Clarence K. Cheng, Yongmei Zhang, Shaowu Zhao, Hongyu
author_sort	Chen, Wei
collection	PubMed
description	Recent studies of 16S rRNA sequences through next-generation sequencing have revolutionized our understanding of the microbial community composition and structure. One common approach in using these data to explore the genetic diversity in a microbial community is to cluster the 16S rRNA sequences into Operational Taxonomic Units (OTUs) based on sequence similarities. The inferred OTUs can then be used to estimate species, diversity, composition, and richness. Although a number of methods have been developed and commonly used to cluster the sequences into OTUs, relatively little guidance is available on their relative performance and the choice of key parameters for each method. In this study, we conducted a comprehensive evaluation of ten existing OTU inference methods. We found that the appropriate dissimilarity value for defining distinct OTUs is not only related with a specific method but also related with the sample complexity. For data sets with low complexity, all the algorithms need a higher dissimilarity threshold to define OTUs. Some methods, such as, CROP and SLP, are more robust to the specific choice of the threshold than other methods, especially for shorter reads. For high-complexity data sets, hierarchical cluster methods need a more strict dissimilarity threshold to define OTUs because the commonly used dissimilarity threshold of 3% often leads to an under-estimation of the number of OTUs. In general, hierarchical clustering methods perform better at lower dissimilarity thresholds. Our results show that sequence abundance plays an important role in OTU inference. We conclude that care is needed to choose both a threshold for dissimilarity and abundance for OTU inference.
format	Online Article Text
id	pubmed-3742672
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-37426722013-08-21 A Comparison of Methods for Clustering 16S rRNA Sequences into OTUs Chen, Wei Zhang, Clarence K. Cheng, Yongmei Zhang, Shaowu Zhao, Hongyu PLoS One Research Article Recent studies of 16S rRNA sequences through next-generation sequencing have revolutionized our understanding of the microbial community composition and structure. One common approach in using these data to explore the genetic diversity in a microbial community is to cluster the 16S rRNA sequences into Operational Taxonomic Units (OTUs) based on sequence similarities. The inferred OTUs can then be used to estimate species, diversity, composition, and richness. Although a number of methods have been developed and commonly used to cluster the sequences into OTUs, relatively little guidance is available on their relative performance and the choice of key parameters for each method. In this study, we conducted a comprehensive evaluation of ten existing OTU inference methods. We found that the appropriate dissimilarity value for defining distinct OTUs is not only related with a specific method but also related with the sample complexity. For data sets with low complexity, all the algorithms need a higher dissimilarity threshold to define OTUs. Some methods, such as, CROP and SLP, are more robust to the specific choice of the threshold than other methods, especially for shorter reads. For high-complexity data sets, hierarchical cluster methods need a more strict dissimilarity threshold to define OTUs because the commonly used dissimilarity threshold of 3% often leads to an under-estimation of the number of OTUs. In general, hierarchical clustering methods perform better at lower dissimilarity thresholds. Our results show that sequence abundance plays an important role in OTU inference. We conclude that care is needed to choose both a threshold for dissimilarity and abundance for OTU inference. Public Library of Science 2013-08-13 /pmc/articles/PMC3742672/ /pubmed/23967117 http://dx.doi.org/10.1371/journal.pone.0070837 Text en © 2013 Chen et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Chen, Wei Zhang, Clarence K. Cheng, Yongmei Zhang, Shaowu Zhao, Hongyu A Comparison of Methods for Clustering 16S rRNA Sequences into OTUs
title	A Comparison of Methods for Clustering 16S rRNA Sequences into OTUs
title_full	A Comparison of Methods for Clustering 16S rRNA Sequences into OTUs
title_fullStr	A Comparison of Methods for Clustering 16S rRNA Sequences into OTUs
title_full_unstemmed	A Comparison of Methods for Clustering 16S rRNA Sequences into OTUs
title_short	A Comparison of Methods for Clustering 16S rRNA Sequences into OTUs
title_sort	comparison of methods for clustering 16s rrna sequences into otus
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3742672/ https://www.ncbi.nlm.nih.gov/pubmed/23967117 http://dx.doi.org/10.1371/journal.pone.0070837
work_keys_str_mv	AT chenwei acomparisonofmethodsforclustering16srrnasequencesintootus AT zhangclarencek acomparisonofmethodsforclustering16srrnasequencesintootus AT chengyongmei acomparisonofmethodsforclustering16srrnasequencesintootus AT zhangshaowu acomparisonofmethodsforclustering16srrnasequencesintootus AT zhaohongyu acomparisonofmethodsforclustering16srrnasequencesintootus AT chenwei comparisonofmethodsforclustering16srrnasequencesintootus AT zhangclarencek comparisonofmethodsforclustering16srrnasequencesintootus AT chengyongmei comparisonofmethodsforclustering16srrnasequencesintootus AT zhangshaowu comparisonofmethodsforclustering16srrnasequencesintootus AT zhaohongyu comparisonofmethodsforclustering16srrnasequencesintootus

A Comparison of Methods for Clustering 16S rRNA Sequences into OTUs

Ejemplares similares