Cargando…

Detection of plasmid contigs in draft genome assemblies using customized Kraken databases

Plasmids play an important role in bacterial evolution and mediate horizontal transfer of genes including virulence and antimicrobial resistance genes. Although short-read sequencing technologies have enabled large-scale bacterial genomics, the resulting draft genome assemblies are often fragmented...

Descripción completa

Detalles Bibliográficos
Autores principales: Gomi, Ryota, Wyres, Kelly L., Holt, Kathryn E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Microbiology Society 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8208688/
https://www.ncbi.nlm.nih.gov/pubmed/33826492
http://dx.doi.org/10.1099/mgen.0.000550
_version_ 1783708971370020864
author Gomi, Ryota
Wyres, Kelly L.
Holt, Kathryn E.
author_facet Gomi, Ryota
Wyres, Kelly L.
Holt, Kathryn E.
author_sort Gomi, Ryota
collection PubMed
description Plasmids play an important role in bacterial evolution and mediate horizontal transfer of genes including virulence and antimicrobial resistance genes. Although short-read sequencing technologies have enabled large-scale bacterial genomics, the resulting draft genome assemblies are often fragmented into hundreds of discrete contigs. Several tools and approaches have been developed to identify plasmid sequences in such assemblies, but require trade-off between sensitivity and specificity. Here we propose using the Kraken classifier, together with a custom Kraken database comprising known chromosomal and plasmid sequences of Klebsiella pneumoniae species complex (KpSC), to identify plasmid-derived contigs in draft assemblies. We assessed performance using Illumina-based draft genome assemblies for 82 KpSC isolates, for which complete genomes were available to supply ground truth. When benchmarked against five other classifiers (Centrifuge, RFPlasmid, mlplasmids, PlaScope and Platon), Kraken showed balanced performance in terms of overall sensitivity and specificity (90.8 and 99.4 %, respectively, for contig count; 96.5 and >99.9 %, respectively, for cumulative contig length), and the highest accuracy (96.8% vs 91.8-96.6% for contig count; 99.8% vs 99.0-99.7 % for cumulative contig length), and F1-score (94.5 % vs 84.5-94.1 %, for contig count; 98.0 % vs 88.9-96.7 % for cumulative contig length). Kraken also achieved consistent performance across our genome collection. Furthermore, we demonstrate that expanding the Kraken database with additional known chromosomal and plasmid sequences can further improve classification performance. Although we have focused here on the KpSC, this methodology could easily be applied to other species with a sufficient number of completed genomes.
format Online
Article
Text
id pubmed-8208688
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Microbiology Society
record_format MEDLINE/PubMed
spelling pubmed-82086882021-06-17 Detection of plasmid contigs in draft genome assemblies using customized Kraken databases Gomi, Ryota Wyres, Kelly L. Holt, Kathryn E. Microb Genom Methods Plasmids play an important role in bacterial evolution and mediate horizontal transfer of genes including virulence and antimicrobial resistance genes. Although short-read sequencing technologies have enabled large-scale bacterial genomics, the resulting draft genome assemblies are often fragmented into hundreds of discrete contigs. Several tools and approaches have been developed to identify plasmid sequences in such assemblies, but require trade-off between sensitivity and specificity. Here we propose using the Kraken classifier, together with a custom Kraken database comprising known chromosomal and plasmid sequences of Klebsiella pneumoniae species complex (KpSC), to identify plasmid-derived contigs in draft assemblies. We assessed performance using Illumina-based draft genome assemblies for 82 KpSC isolates, for which complete genomes were available to supply ground truth. When benchmarked against five other classifiers (Centrifuge, RFPlasmid, mlplasmids, PlaScope and Platon), Kraken showed balanced performance in terms of overall sensitivity and specificity (90.8 and 99.4 %, respectively, for contig count; 96.5 and >99.9 %, respectively, for cumulative contig length), and the highest accuracy (96.8% vs 91.8-96.6% for contig count; 99.8% vs 99.0-99.7 % for cumulative contig length), and F1-score (94.5 % vs 84.5-94.1 %, for contig count; 98.0 % vs 88.9-96.7 % for cumulative contig length). Kraken also achieved consistent performance across our genome collection. Furthermore, we demonstrate that expanding the Kraken database with additional known chromosomal and plasmid sequences can further improve classification performance. Although we have focused here on the KpSC, this methodology could easily be applied to other species with a sufficient number of completed genomes. Microbiology Society 2021-04-07 /pmc/articles/PMC8208688/ /pubmed/33826492 http://dx.doi.org/10.1099/mgen.0.000550 Text en © 2021 The Authors https://creativecommons.org/licenses/by-nc/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution NonCommercial License.
spellingShingle Methods
Gomi, Ryota
Wyres, Kelly L.
Holt, Kathryn E.
Detection of plasmid contigs in draft genome assemblies using customized Kraken databases
title Detection of plasmid contigs in draft genome assemblies using customized Kraken databases
title_full Detection of plasmid contigs in draft genome assemblies using customized Kraken databases
title_fullStr Detection of plasmid contigs in draft genome assemblies using customized Kraken databases
title_full_unstemmed Detection of plasmid contigs in draft genome assemblies using customized Kraken databases
title_short Detection of plasmid contigs in draft genome assemblies using customized Kraken databases
title_sort detection of plasmid contigs in draft genome assemblies using customized kraken databases
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8208688/
https://www.ncbi.nlm.nih.gov/pubmed/33826492
http://dx.doi.org/10.1099/mgen.0.000550
work_keys_str_mv AT gomiryota detectionofplasmidcontigsindraftgenomeassembliesusingcustomizedkrakendatabases
AT wyreskellyl detectionofplasmidcontigsindraftgenomeassembliesusingcustomizedkrakendatabases
AT holtkathryne detectionofplasmidcontigsindraftgenomeassembliesusingcustomizedkrakendatabases