Cargando…

k-SLAM: accurate and ultra-fast taxonomic classification and gene identification for large metagenomic data sets

k-SLAM is a highly efficient algorithm for the characterization of metagenomic data. Unlike other ultra-fast metagenomic classifiers, full sequence alignment is performed allowing for gene identification and variant calling in addition to accurate taxonomic classification. A k-mer based method provi...

Descripción completa

Detalles Bibliográficos
Autores principales: Ainsworth, David, Sternberg, Michael J.E., Raczy, Come, Butcher, Sarah A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5389551/
https://www.ncbi.nlm.nih.gov/pubmed/27965413
http://dx.doi.org/10.1093/nar/gkw1248
Descripción
Sumario:k-SLAM is a highly efficient algorithm for the characterization of metagenomic data. Unlike other ultra-fast metagenomic classifiers, full sequence alignment is performed allowing for gene identification and variant calling in addition to accurate taxonomic classification. A k-mer based method provides greater taxonomic accuracy than other classifiers and a three orders of magnitude speed increase over alignment based approaches. The use of alignments to find variants and genes along with their taxonomic origins enables novel strains to be characterized. k-SLAM's speed allows a full taxonomic classification and gene identification to be tractable on modern large data sets. A pseudo-assembly method is used to increase classification accuracy by up to 40% for species which have high sequence homology within their genus.