Cargando…

PlasClass improves plasmid sequence classification

Many bacteria contain plasmids, but separating between contigs that originate on the plasmid and those that are part of the bacterial genome can be difficult. This is especially true in metagenomic assembly, which yields many contigs of unknown origin. Existing tools for classifying sequences of pla...

Descripción completa

Detalles Bibliográficos
Autores principales: Pellow, David, Mizrahi, Itzik, Shamir, Ron
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7159247/
https://www.ncbi.nlm.nih.gov/pubmed/32243433
http://dx.doi.org/10.1371/journal.pcbi.1007781
_version_ 1783522625357611008
author Pellow, David
Mizrahi, Itzik
Shamir, Ron
author_facet Pellow, David
Mizrahi, Itzik
Shamir, Ron
author_sort Pellow, David
collection PubMed
description Many bacteria contain plasmids, but separating between contigs that originate on the plasmid and those that are part of the bacterial genome can be difficult. This is especially true in metagenomic assembly, which yields many contigs of unknown origin. Existing tools for classifying sequences of plasmid origin give less reliable results for shorter sequences, are trained using a fraction of the known plasmids, and can be difficult to use in practice. We present PlasClass, a new plasmid classifier. It uses a set of standard classifiers trained on the most current set of known plasmid sequences for different sequence lengths. We tested PlasClass sequence classification on held-out data and simulations, as well as publicly available bacterial isolates and plasmidome samples and plasmids assembled from metagenomic samples. PlasClass outperforms the state-of-the-art plasmid classification tool on shorter sequences, which constitute the majority of assembly contigs, allowing it to achieve higher F1 scores in classifying sequences from a wide range of datasets. PlasClass also uses significantly less time and memory. PlasClass can be used to easily classify plasmid and bacterial genome sequences in metagenomic or isolate assemblies. It is available under the MIT license from: https://github.com/Shamir-Lab/PlasClass.
format Online
Article
Text
id pubmed-7159247
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-71592472020-04-24 PlasClass improves plasmid sequence classification Pellow, David Mizrahi, Itzik Shamir, Ron PLoS Comput Biol Research Article Many bacteria contain plasmids, but separating between contigs that originate on the plasmid and those that are part of the bacterial genome can be difficult. This is especially true in metagenomic assembly, which yields many contigs of unknown origin. Existing tools for classifying sequences of plasmid origin give less reliable results for shorter sequences, are trained using a fraction of the known plasmids, and can be difficult to use in practice. We present PlasClass, a new plasmid classifier. It uses a set of standard classifiers trained on the most current set of known plasmid sequences for different sequence lengths. We tested PlasClass sequence classification on held-out data and simulations, as well as publicly available bacterial isolates and plasmidome samples and plasmids assembled from metagenomic samples. PlasClass outperforms the state-of-the-art plasmid classification tool on shorter sequences, which constitute the majority of assembly contigs, allowing it to achieve higher F1 scores in classifying sequences from a wide range of datasets. PlasClass also uses significantly less time and memory. PlasClass can be used to easily classify plasmid and bacterial genome sequences in metagenomic or isolate assemblies. It is available under the MIT license from: https://github.com/Shamir-Lab/PlasClass. Public Library of Science 2020-04-03 /pmc/articles/PMC7159247/ /pubmed/32243433 http://dx.doi.org/10.1371/journal.pcbi.1007781 Text en © 2020 Pellow et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Pellow, David
Mizrahi, Itzik
Shamir, Ron
PlasClass improves plasmid sequence classification
title PlasClass improves plasmid sequence classification
title_full PlasClass improves plasmid sequence classification
title_fullStr PlasClass improves plasmid sequence classification
title_full_unstemmed PlasClass improves plasmid sequence classification
title_short PlasClass improves plasmid sequence classification
title_sort plasclass improves plasmid sequence classification
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7159247/
https://www.ncbi.nlm.nih.gov/pubmed/32243433
http://dx.doi.org/10.1371/journal.pcbi.1007781
work_keys_str_mv AT pellowdavid plasclassimprovesplasmidsequenceclassification
AT mizrahiitzik plasclassimprovesplasmidsequenceclassification
AT shamirron plasclassimprovesplasmidsequenceclassification