Cargando…

Robust taxonomic classification of uncharted microbial sequences and bins with CAT and BAT

Current-day metagenomics analyses increasingly involve de novo taxonomic classification of long DNA sequences and metagenome-assembled genomes. Here, we show that the conventional best-hit approach often leads to classifications that are too specific, especially when the sequences represent novel de...

Descripción completa

Detalles Bibliográficos
Autores principales: von Meijenfeldt, F. A. Bastiaan, Arkhipova, Ksenia, Cambuy, Diego D., Coutinho, Felipe H., Dutilh, Bas E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6805573/
https://www.ncbi.nlm.nih.gov/pubmed/31640809
http://dx.doi.org/10.1186/s13059-019-1817-x
Descripción
Sumario:Current-day metagenomics analyses increasingly involve de novo taxonomic classification of long DNA sequences and metagenome-assembled genomes. Here, we show that the conventional best-hit approach often leads to classifications that are too specific, especially when the sequences represent novel deep lineages. We present a classification method that integrates multiple signals to classify sequences (Contig Annotation Tool, CAT) and metagenome-assembled genomes (Bin Annotation Tool, BAT). Classifications are automatically made at low taxonomic ranks if closely related organisms are present in the reference database and at higher ranks otherwise. The result is a high classification precision even for sequences from considerably unknown organisms.