Cargando…

Platon: identification and characterization of bacterial plasmid contigs in short-read draft assemblies exploiting protein sequence-based replicon distribution scores

Plasmids are extrachromosomal genetic elements that replicate independently of the chromosome and play a vital role in the environmental adaptation of bacteria. Due to potential mobilization or conjugation capabilities, plasmids are important genetic vehicles for antimicrobial resistance genes and v...

Descripción completa

Detalles Bibliográficos
Autores principales: Schwengers, Oliver, Barth, Patrick, Falgenhauer, Linda, Hain, Torsten, Chakraborty, Trinad, Goesmann, Alexander
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Microbiology Society 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7660248/
https://www.ncbi.nlm.nih.gov/pubmed/32579097
http://dx.doi.org/10.1099/mgen.0.000398
_version_ 1783608971271077888
author Schwengers, Oliver
Barth, Patrick
Falgenhauer, Linda
Hain, Torsten
Chakraborty, Trinad
Goesmann, Alexander
author_facet Schwengers, Oliver
Barth, Patrick
Falgenhauer, Linda
Hain, Torsten
Chakraborty, Trinad
Goesmann, Alexander
author_sort Schwengers, Oliver
collection PubMed
description Plasmids are extrachromosomal genetic elements that replicate independently of the chromosome and play a vital role in the environmental adaptation of bacteria. Due to potential mobilization or conjugation capabilities, plasmids are important genetic vehicles for antimicrobial resistance genes and virulence factors with huge and increasing clinical implications. They are therefore subject to large genomic studies within the scientific community worldwide. As a result of rapidly improving next-generation sequencing methods, the quantity of sequenced bacterial genomes is constantly increasing, in turn raising the need for specialized tools to (i) extract plasmid sequences from draft assemblies, (ii) derive their origin and distribution, and (iii) further investigate their genetic repertoire. Recently, several bioinformatic methods and tools have emerged to tackle this issue; however, a combination of high sensitivity and specificity in plasmid sequence identification is rarely achieved in a taxon-independent manner. In addition, many software tools are not appropriate for large high-throughput analyses or cannot be included in existing software pipelines due to their technical design or software implementation. In this study, we investigated differences in the replicon distributions of protein-coding genes on a large scale as a new approach to distinguish plasmid-borne from chromosome-borne contigs. We defined and computed statistical discrimination thresholds for a new metric: the replicon distribution score (RDS), which achieved an accuracy of 96.6 %. The final performance was further improved by the combination of the RDS metric with heuristics exploiting several plasmid-specific higher-level contig characterizations. We implemented this workflow in a new high-throughput taxon-independent bioinformatics software tool called Platon for the recruitment and characterization of plasmid-borne contigs from short-read draft assemblies. Compared to PlasFlow, Platon achieved a higher accuracy (97.5 %) and more balanced predictions (F1=82.6 %) tested on a broad range of bacterial taxa and better or equal performance against the targeted tools PlasmidFinder and PlaScope on sequenced Escherichia coli isolates. Platon is available at: http://platon.computational.bio/.
format Online
Article
Text
id pubmed-7660248
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Microbiology Society
record_format MEDLINE/PubMed
spelling pubmed-76602482020-11-13 Platon: identification and characterization of bacterial plasmid contigs in short-read draft assemblies exploiting protein sequence-based replicon distribution scores Schwengers, Oliver Barth, Patrick Falgenhauer, Linda Hain, Torsten Chakraborty, Trinad Goesmann, Alexander Microb Genom Research Article Plasmids are extrachromosomal genetic elements that replicate independently of the chromosome and play a vital role in the environmental adaptation of bacteria. Due to potential mobilization or conjugation capabilities, plasmids are important genetic vehicles for antimicrobial resistance genes and virulence factors with huge and increasing clinical implications. They are therefore subject to large genomic studies within the scientific community worldwide. As a result of rapidly improving next-generation sequencing methods, the quantity of sequenced bacterial genomes is constantly increasing, in turn raising the need for specialized tools to (i) extract plasmid sequences from draft assemblies, (ii) derive their origin and distribution, and (iii) further investigate their genetic repertoire. Recently, several bioinformatic methods and tools have emerged to tackle this issue; however, a combination of high sensitivity and specificity in plasmid sequence identification is rarely achieved in a taxon-independent manner. In addition, many software tools are not appropriate for large high-throughput analyses or cannot be included in existing software pipelines due to their technical design or software implementation. In this study, we investigated differences in the replicon distributions of protein-coding genes on a large scale as a new approach to distinguish plasmid-borne from chromosome-borne contigs. We defined and computed statistical discrimination thresholds for a new metric: the replicon distribution score (RDS), which achieved an accuracy of 96.6 %. The final performance was further improved by the combination of the RDS metric with heuristics exploiting several plasmid-specific higher-level contig characterizations. We implemented this workflow in a new high-throughput taxon-independent bioinformatics software tool called Platon for the recruitment and characterization of plasmid-borne contigs from short-read draft assemblies. Compared to PlasFlow, Platon achieved a higher accuracy (97.5 %) and more balanced predictions (F1=82.6 %) tested on a broad range of bacterial taxa and better or equal performance against the targeted tools PlasmidFinder and PlaScope on sequenced Escherichia coli isolates. Platon is available at: http://platon.computational.bio/. Microbiology Society 2020-06-24 /pmc/articles/PMC7660248/ /pubmed/32579097 http://dx.doi.org/10.1099/mgen.0.000398 Text en © 2020 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License.
spellingShingle Research Article
Schwengers, Oliver
Barth, Patrick
Falgenhauer, Linda
Hain, Torsten
Chakraborty, Trinad
Goesmann, Alexander
Platon: identification and characterization of bacterial plasmid contigs in short-read draft assemblies exploiting protein sequence-based replicon distribution scores
title Platon: identification and characterization of bacterial plasmid contigs in short-read draft assemblies exploiting protein sequence-based replicon distribution scores
title_full Platon: identification and characterization of bacterial plasmid contigs in short-read draft assemblies exploiting protein sequence-based replicon distribution scores
title_fullStr Platon: identification and characterization of bacterial plasmid contigs in short-read draft assemblies exploiting protein sequence-based replicon distribution scores
title_full_unstemmed Platon: identification and characterization of bacterial plasmid contigs in short-read draft assemblies exploiting protein sequence-based replicon distribution scores
title_short Platon: identification and characterization of bacterial plasmid contigs in short-read draft assemblies exploiting protein sequence-based replicon distribution scores
title_sort platon: identification and characterization of bacterial plasmid contigs in short-read draft assemblies exploiting protein sequence-based replicon distribution scores
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7660248/
https://www.ncbi.nlm.nih.gov/pubmed/32579097
http://dx.doi.org/10.1099/mgen.0.000398
work_keys_str_mv AT schwengersoliver platonidentificationandcharacterizationofbacterialplasmidcontigsinshortreaddraftassembliesexploitingproteinsequencebasedreplicondistributionscores
AT barthpatrick platonidentificationandcharacterizationofbacterialplasmidcontigsinshortreaddraftassembliesexploitingproteinsequencebasedreplicondistributionscores
AT falgenhauerlinda platonidentificationandcharacterizationofbacterialplasmidcontigsinshortreaddraftassembliesexploitingproteinsequencebasedreplicondistributionscores
AT haintorsten platonidentificationandcharacterizationofbacterialplasmidcontigsinshortreaddraftassembliesexploitingproteinsequencebasedreplicondistributionscores
AT chakrabortytrinad platonidentificationandcharacterizationofbacterialplasmidcontigsinshortreaddraftassembliesexploitingproteinsequencebasedreplicondistributionscores
AT goesmannalexander platonidentificationandcharacterizationofbacterialplasmidcontigsinshortreaddraftassembliesexploitingproteinsequencebasedreplicondistributionscores