Cargando…

Swarm v2: highly-scalable and high-resolution amplicon clustering

Previously we presented Swarm v1, a novel and open source amplicon clustering program that produced fine-scale molecular operational taxonomic units (OTUs), free of arbitrary global clustering thresholds and input-order dependency. Swarm v1 worked with an initial phase that used iterative single-lin...

Descripción completa

Detalles Bibliográficos
Autores principales: Mahé, Frédéric, Rognes, Torbjørn, Quince, Christopher, de Vargas, Colomban, Dunthorn, Micah
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4690345/
https://www.ncbi.nlm.nih.gov/pubmed/26713226
http://dx.doi.org/10.7717/peerj.1420
_version_ 1782407000283414528
author Mahé, Frédéric
Rognes, Torbjørn
Quince, Christopher
de Vargas, Colomban
Dunthorn, Micah
author_facet Mahé, Frédéric
Rognes, Torbjørn
Quince, Christopher
de Vargas, Colomban
Dunthorn, Micah
author_sort Mahé, Frédéric
collection PubMed
description Previously we presented Swarm v1, a novel and open source amplicon clustering program that produced fine-scale molecular operational taxonomic units (OTUs), free of arbitrary global clustering thresholds and input-order dependency. Swarm v1 worked with an initial phase that used iterative single-linkage with a local clustering threshold (d), followed by a phase that used the internal abundance structures of clusters to break chained OTUs. Here we present Swarm v2, which has two important novel features: (1) a new algorithm for d = 1 that allows the computation time of the program to scale linearly with increasing amounts of data; and (2) the new fastidious option that reduces under-grouping by grafting low abundant OTUs (e.g., singletons and doubletons) onto larger ones. Swarm v2 also directly integrates the clustering and breaking phases, dereplicates sequencing reads with d = 0, outputs OTU representatives in fasta format, and plots individual OTUs as two-dimensional networks.
format Online
Article
Text
id pubmed-4690345
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-46903452015-12-28 Swarm v2: highly-scalable and high-resolution amplicon clustering Mahé, Frédéric Rognes, Torbjørn Quince, Christopher de Vargas, Colomban Dunthorn, Micah PeerJ Biodiversity Previously we presented Swarm v1, a novel and open source amplicon clustering program that produced fine-scale molecular operational taxonomic units (OTUs), free of arbitrary global clustering thresholds and input-order dependency. Swarm v1 worked with an initial phase that used iterative single-linkage with a local clustering threshold (d), followed by a phase that used the internal abundance structures of clusters to break chained OTUs. Here we present Swarm v2, which has two important novel features: (1) a new algorithm for d = 1 that allows the computation time of the program to scale linearly with increasing amounts of data; and (2) the new fastidious option that reduces under-grouping by grafting low abundant OTUs (e.g., singletons and doubletons) onto larger ones. Swarm v2 also directly integrates the clustering and breaking phases, dereplicates sequencing reads with d = 0, outputs OTU representatives in fasta format, and plots individual OTUs as two-dimensional networks. PeerJ Inc. 2015-12-10 /pmc/articles/PMC4690345/ /pubmed/26713226 http://dx.doi.org/10.7717/peerj.1420 Text en © 2015 Mahé et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Biodiversity
Mahé, Frédéric
Rognes, Torbjørn
Quince, Christopher
de Vargas, Colomban
Dunthorn, Micah
Swarm v2: highly-scalable and high-resolution amplicon clustering
title Swarm v2: highly-scalable and high-resolution amplicon clustering
title_full Swarm v2: highly-scalable and high-resolution amplicon clustering
title_fullStr Swarm v2: highly-scalable and high-resolution amplicon clustering
title_full_unstemmed Swarm v2: highly-scalable and high-resolution amplicon clustering
title_short Swarm v2: highly-scalable and high-resolution amplicon clustering
title_sort swarm v2: highly-scalable and high-resolution amplicon clustering
topic Biodiversity
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4690345/
https://www.ncbi.nlm.nih.gov/pubmed/26713226
http://dx.doi.org/10.7717/peerj.1420
work_keys_str_mv AT mahefrederic swarmv2highlyscalableandhighresolutionampliconclustering
AT rognestorbjørn swarmv2highlyscalableandhighresolutionampliconclustering
AT quincechristopher swarmv2highlyscalableandhighresolutionampliconclustering
AT devargascolomban swarmv2highlyscalableandhighresolutionampliconclustering
AT dunthornmicah swarmv2highlyscalableandhighresolutionampliconclustering