Cargando…
DMSC: A Dynamic Multi-Seeds Method for Clustering 16S rRNA Sequences Into OTUs
Next-generation sequencing (NGS)-based 16S rRNA sequencing by jointly using the PCR amplification and NGS technology is a cost-effective technique, which has been successfully used to study the phylogeny and taxonomy of samples from complex microbiomes or environments. Clustering 16S rRNA sequences...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6422886/ https://www.ncbi.nlm.nih.gov/pubmed/30915052 http://dx.doi.org/10.3389/fmicb.2019.00428 |
_version_ | 1783404432889741312 |
---|---|
author | Wei, Ze-Gang Zhang, Shao-Wu |
author_facet | Wei, Ze-Gang Zhang, Shao-Wu |
author_sort | Wei, Ze-Gang |
collection | PubMed |
description | Next-generation sequencing (NGS)-based 16S rRNA sequencing by jointly using the PCR amplification and NGS technology is a cost-effective technique, which has been successfully used to study the phylogeny and taxonomy of samples from complex microbiomes or environments. Clustering 16S rRNA sequences into operational taxonomic units (OTUs) is often the first step for many downstream analyses. Heuristic clustering is one of the most widely employed approaches for generating OTUs. However, most heuristic OTUs clustering methods just select one single seed sequence to represent each cluster, resulting in their outcomes suffer from either overestimation of OTUs number or sensitivity to sequencing errors. In this paper, we present a novel dynamic multi-seeds clustering method (namely DMSC) to pick OTUs. DMSC first heuristically generates clusters according to the distance threshold. When the size of a cluster reaches the pre-defined minimum size, then DMSC selects the multi-core sequences (MCS) as the seeds that are defined as the n-core sequences (n ≥ 3), in which the distance between any two sequences is less than the distance threshold. A new sequence is assigned to the corresponding cluster depending on the average distance to MCS and the distance standard deviation within the MCS. If a new sequence is added to the cluster, dynamically update the MCS until no sequence is merged into the cluster. The new method DMSC was tested on several simulated and real-life sequence datasets and also compared with the traditional heuristic methods such as CD-HIT, UCLUST, and DBH. Experimental results in terms of the inferred OTUs number, normalized mutual information (NMI) and Matthew correlation coefficient (MCC) metrics demonstrate that DMSC can produce higher quality clusters with low memory usage and reduce OTU overestimation. Additionally, DMSC is also robust to the sequencing errors. The DMSC software can be freely downloaded from https://github.com/NWPU-903PR/DMSC. |
format | Online Article Text |
id | pubmed-6422886 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-64228862019-03-26 DMSC: A Dynamic Multi-Seeds Method for Clustering 16S rRNA Sequences Into OTUs Wei, Ze-Gang Zhang, Shao-Wu Front Microbiol Microbiology Next-generation sequencing (NGS)-based 16S rRNA sequencing by jointly using the PCR amplification and NGS technology is a cost-effective technique, which has been successfully used to study the phylogeny and taxonomy of samples from complex microbiomes or environments. Clustering 16S rRNA sequences into operational taxonomic units (OTUs) is often the first step for many downstream analyses. Heuristic clustering is one of the most widely employed approaches for generating OTUs. However, most heuristic OTUs clustering methods just select one single seed sequence to represent each cluster, resulting in their outcomes suffer from either overestimation of OTUs number or sensitivity to sequencing errors. In this paper, we present a novel dynamic multi-seeds clustering method (namely DMSC) to pick OTUs. DMSC first heuristically generates clusters according to the distance threshold. When the size of a cluster reaches the pre-defined minimum size, then DMSC selects the multi-core sequences (MCS) as the seeds that are defined as the n-core sequences (n ≥ 3), in which the distance between any two sequences is less than the distance threshold. A new sequence is assigned to the corresponding cluster depending on the average distance to MCS and the distance standard deviation within the MCS. If a new sequence is added to the cluster, dynamically update the MCS until no sequence is merged into the cluster. The new method DMSC was tested on several simulated and real-life sequence datasets and also compared with the traditional heuristic methods such as CD-HIT, UCLUST, and DBH. Experimental results in terms of the inferred OTUs number, normalized mutual information (NMI) and Matthew correlation coefficient (MCC) metrics demonstrate that DMSC can produce higher quality clusters with low memory usage and reduce OTU overestimation. Additionally, DMSC is also robust to the sequencing errors. The DMSC software can be freely downloaded from https://github.com/NWPU-903PR/DMSC. Frontiers Media S.A. 2019-03-12 /pmc/articles/PMC6422886/ /pubmed/30915052 http://dx.doi.org/10.3389/fmicb.2019.00428 Text en Copyright © 2019 Wei and Zhang. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Microbiology Wei, Ze-Gang Zhang, Shao-Wu DMSC: A Dynamic Multi-Seeds Method for Clustering 16S rRNA Sequences Into OTUs |
title | DMSC: A Dynamic Multi-Seeds Method for Clustering 16S rRNA Sequences Into OTUs |
title_full | DMSC: A Dynamic Multi-Seeds Method for Clustering 16S rRNA Sequences Into OTUs |
title_fullStr | DMSC: A Dynamic Multi-Seeds Method for Clustering 16S rRNA Sequences Into OTUs |
title_full_unstemmed | DMSC: A Dynamic Multi-Seeds Method for Clustering 16S rRNA Sequences Into OTUs |
title_short | DMSC: A Dynamic Multi-Seeds Method for Clustering 16S rRNA Sequences Into OTUs |
title_sort | dmsc: a dynamic multi-seeds method for clustering 16s rrna sequences into otus |
topic | Microbiology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6422886/ https://www.ncbi.nlm.nih.gov/pubmed/30915052 http://dx.doi.org/10.3389/fmicb.2019.00428 |
work_keys_str_mv | AT weizegang dmscadynamicmultiseedsmethodforclustering16srrnasequencesintootus AT zhangshaowu dmscadynamicmultiseedsmethodforclustering16srrnasequencesintootus |