Cargando…

mlplasmids: a user-friendly tool to predict plasmid- and chromosome-derived sequences for single species

Assembly of bacterial short-read whole-genome sequencing data frequently results in hundreds of contigs for which the origin, plasmid or chromosome, is unclear. Complete genomes resolved by long-read sequencing can be used to generate and label short-read contigs. These were used to train several po...

Descripción completa

Detalles Bibliográficos
Autores principales: Arredondo-Alonso, Sergio, Rogers, Malbert R. C., Braat, Johanna C., Verschuuren, Tess D., Top, Janetta, Corander, Jukka, Willems, Rob J. L., Schürch, Anita C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Microbiology Society 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6321875/
https://www.ncbi.nlm.nih.gov/pubmed/30383524
http://dx.doi.org/10.1099/mgen.0.000224
_version_ 1783385530364329984
author Arredondo-Alonso, Sergio
Rogers, Malbert R. C.
Braat, Johanna C.
Verschuuren, Tess D.
Top, Janetta
Corander, Jukka
Willems, Rob J. L.
Schürch, Anita C.
author_facet Arredondo-Alonso, Sergio
Rogers, Malbert R. C.
Braat, Johanna C.
Verschuuren, Tess D.
Top, Janetta
Corander, Jukka
Willems, Rob J. L.
Schürch, Anita C.
author_sort Arredondo-Alonso, Sergio
collection PubMed
description Assembly of bacterial short-read whole-genome sequencing data frequently results in hundreds of contigs for which the origin, plasmid or chromosome, is unclear. Complete genomes resolved by long-read sequencing can be used to generate and label short-read contigs. These were used to train several popular machine learning methods to classify the origin of contigs from Enterococcus faecium, Klebsiella pneumoniae and Escherichia coli using pentamer frequencies. We selected support-vector machine (SVM) models as the best classifier for all three bacterial species (F1-score E. faecium=0.92, F1-score K. pneumoniae=0.90, F1-score E. coli=0.76), which outperformed other existing plasmid prediction tools using a benchmarking set of isolates. We demonstrated the scalability of our models by accurately predicting the plasmidome of a large collection of 1644 E. faecium isolates and illustrate its applicability by predicting the location of antibiotic-resistance genes in all three species. The SVM classifiers are publicly available as an R package and graphical-user interface called ‘mlplasmids’. We anticipate that this tool may significantly facilitate research on the dissemination of plasmids encoding antibiotic resistance and/or contributing to host adaptation.
format Online
Article
Text
id pubmed-6321875
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Microbiology Society
record_format MEDLINE/PubMed
spelling pubmed-63218752019-02-25 mlplasmids: a user-friendly tool to predict plasmid- and chromosome-derived sequences for single species Arredondo-Alonso, Sergio Rogers, Malbert R. C. Braat, Johanna C. Verschuuren, Tess D. Top, Janetta Corander, Jukka Willems, Rob J. L. Schürch, Anita C. Microb Genom Research Article Assembly of bacterial short-read whole-genome sequencing data frequently results in hundreds of contigs for which the origin, plasmid or chromosome, is unclear. Complete genomes resolved by long-read sequencing can be used to generate and label short-read contigs. These were used to train several popular machine learning methods to classify the origin of contigs from Enterococcus faecium, Klebsiella pneumoniae and Escherichia coli using pentamer frequencies. We selected support-vector machine (SVM) models as the best classifier for all three bacterial species (F1-score E. faecium=0.92, F1-score K. pneumoniae=0.90, F1-score E. coli=0.76), which outperformed other existing plasmid prediction tools using a benchmarking set of isolates. We demonstrated the scalability of our models by accurately predicting the plasmidome of a large collection of 1644 E. faecium isolates and illustrate its applicability by predicting the location of antibiotic-resistance genes in all three species. The SVM classifiers are publicly available as an R package and graphical-user interface called ‘mlplasmids’. We anticipate that this tool may significantly facilitate research on the dissemination of plasmids encoding antibiotic resistance and/or contributing to host adaptation. Microbiology Society 2018-11-01 /pmc/articles/PMC6321875/ /pubmed/30383524 http://dx.doi.org/10.1099/mgen.0.000224 Text en © 2018 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Arredondo-Alonso, Sergio
Rogers, Malbert R. C.
Braat, Johanna C.
Verschuuren, Tess D.
Top, Janetta
Corander, Jukka
Willems, Rob J. L.
Schürch, Anita C.
mlplasmids: a user-friendly tool to predict plasmid- and chromosome-derived sequences for single species
title mlplasmids: a user-friendly tool to predict plasmid- and chromosome-derived sequences for single species
title_full mlplasmids: a user-friendly tool to predict plasmid- and chromosome-derived sequences for single species
title_fullStr mlplasmids: a user-friendly tool to predict plasmid- and chromosome-derived sequences for single species
title_full_unstemmed mlplasmids: a user-friendly tool to predict plasmid- and chromosome-derived sequences for single species
title_short mlplasmids: a user-friendly tool to predict plasmid- and chromosome-derived sequences for single species
title_sort mlplasmids: a user-friendly tool to predict plasmid- and chromosome-derived sequences for single species
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6321875/
https://www.ncbi.nlm.nih.gov/pubmed/30383524
http://dx.doi.org/10.1099/mgen.0.000224
work_keys_str_mv AT arredondoalonsosergio mlplasmidsauserfriendlytooltopredictplasmidandchromosomederivedsequencesforsinglespecies
AT rogersmalbertrc mlplasmidsauserfriendlytooltopredictplasmidandchromosomederivedsequencesforsinglespecies
AT braatjohannac mlplasmidsauserfriendlytooltopredictplasmidandchromosomederivedsequencesforsinglespecies
AT verschuurentessd mlplasmidsauserfriendlytooltopredictplasmidandchromosomederivedsequencesforsinglespecies
AT topjanetta mlplasmidsauserfriendlytooltopredictplasmidandchromosomederivedsequencesforsinglespecies
AT coranderjukka mlplasmidsauserfriendlytooltopredictplasmidandchromosomederivedsequencesforsinglespecies
AT willemsrobjl mlplasmidsauserfriendlytooltopredictplasmidandchromosomederivedsequencesforsinglespecies
AT schurchanitac mlplasmidsauserfriendlytooltopredictplasmidandchromosomederivedsequencesforsinglespecies