Cargando…
Platon: identification and characterization of bacterial plasmid contigs in short-read draft assemblies exploiting protein sequence-based replicon distribution scores
Plasmids are extrachromosomal genetic elements that replicate independently of the chromosome and play a vital role in the environmental adaptation of bacteria. Due to potential mobilization or conjugation capabilities, plasmids are important genetic vehicles for antimicrobial resistance genes and v...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Microbiology Society
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7660248/ https://www.ncbi.nlm.nih.gov/pubmed/32579097 http://dx.doi.org/10.1099/mgen.0.000398 |
_version_ | 1783608971271077888 |
---|---|
author | Schwengers, Oliver Barth, Patrick Falgenhauer, Linda Hain, Torsten Chakraborty, Trinad Goesmann, Alexander |
author_facet | Schwengers, Oliver Barth, Patrick Falgenhauer, Linda Hain, Torsten Chakraborty, Trinad Goesmann, Alexander |
author_sort | Schwengers, Oliver |
collection | PubMed |
description | Plasmids are extrachromosomal genetic elements that replicate independently of the chromosome and play a vital role in the environmental adaptation of bacteria. Due to potential mobilization or conjugation capabilities, plasmids are important genetic vehicles for antimicrobial resistance genes and virulence factors with huge and increasing clinical implications. They are therefore subject to large genomic studies within the scientific community worldwide. As a result of rapidly improving next-generation sequencing methods, the quantity of sequenced bacterial genomes is constantly increasing, in turn raising the need for specialized tools to (i) extract plasmid sequences from draft assemblies, (ii) derive their origin and distribution, and (iii) further investigate their genetic repertoire. Recently, several bioinformatic methods and tools have emerged to tackle this issue; however, a combination of high sensitivity and specificity in plasmid sequence identification is rarely achieved in a taxon-independent manner. In addition, many software tools are not appropriate for large high-throughput analyses or cannot be included in existing software pipelines due to their technical design or software implementation. In this study, we investigated differences in the replicon distributions of protein-coding genes on a large scale as a new approach to distinguish plasmid-borne from chromosome-borne contigs. We defined and computed statistical discrimination thresholds for a new metric: the replicon distribution score (RDS), which achieved an accuracy of 96.6 %. The final performance was further improved by the combination of the RDS metric with heuristics exploiting several plasmid-specific higher-level contig characterizations. We implemented this workflow in a new high-throughput taxon-independent bioinformatics software tool called Platon for the recruitment and characterization of plasmid-borne contigs from short-read draft assemblies. Compared to PlasFlow, Platon achieved a higher accuracy (97.5 %) and more balanced predictions (F1=82.6 %) tested on a broad range of bacterial taxa and better or equal performance against the targeted tools PlasmidFinder and PlaScope on sequenced Escherichia coli isolates. Platon is available at: http://platon.computational.bio/. |
format | Online Article Text |
id | pubmed-7660248 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Microbiology Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-76602482020-11-13 Platon: identification and characterization of bacterial plasmid contigs in short-read draft assemblies exploiting protein sequence-based replicon distribution scores Schwengers, Oliver Barth, Patrick Falgenhauer, Linda Hain, Torsten Chakraborty, Trinad Goesmann, Alexander Microb Genom Research Article Plasmids are extrachromosomal genetic elements that replicate independently of the chromosome and play a vital role in the environmental adaptation of bacteria. Due to potential mobilization or conjugation capabilities, plasmids are important genetic vehicles for antimicrobial resistance genes and virulence factors with huge and increasing clinical implications. They are therefore subject to large genomic studies within the scientific community worldwide. As a result of rapidly improving next-generation sequencing methods, the quantity of sequenced bacterial genomes is constantly increasing, in turn raising the need for specialized tools to (i) extract plasmid sequences from draft assemblies, (ii) derive their origin and distribution, and (iii) further investigate their genetic repertoire. Recently, several bioinformatic methods and tools have emerged to tackle this issue; however, a combination of high sensitivity and specificity in plasmid sequence identification is rarely achieved in a taxon-independent manner. In addition, many software tools are not appropriate for large high-throughput analyses or cannot be included in existing software pipelines due to their technical design or software implementation. In this study, we investigated differences in the replicon distributions of protein-coding genes on a large scale as a new approach to distinguish plasmid-borne from chromosome-borne contigs. We defined and computed statistical discrimination thresholds for a new metric: the replicon distribution score (RDS), which achieved an accuracy of 96.6 %. The final performance was further improved by the combination of the RDS metric with heuristics exploiting several plasmid-specific higher-level contig characterizations. We implemented this workflow in a new high-throughput taxon-independent bioinformatics software tool called Platon for the recruitment and characterization of plasmid-borne contigs from short-read draft assemblies. Compared to PlasFlow, Platon achieved a higher accuracy (97.5 %) and more balanced predictions (F1=82.6 %) tested on a broad range of bacterial taxa and better or equal performance against the targeted tools PlasmidFinder and PlaScope on sequenced Escherichia coli isolates. Platon is available at: http://platon.computational.bio/. Microbiology Society 2020-06-24 /pmc/articles/PMC7660248/ /pubmed/32579097 http://dx.doi.org/10.1099/mgen.0.000398 Text en © 2020 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License. |
spellingShingle | Research Article Schwengers, Oliver Barth, Patrick Falgenhauer, Linda Hain, Torsten Chakraborty, Trinad Goesmann, Alexander Platon: identification and characterization of bacterial plasmid contigs in short-read draft assemblies exploiting protein sequence-based replicon distribution scores |
title | Platon: identification and characterization of bacterial plasmid contigs in short-read draft assemblies exploiting protein sequence-based replicon distribution scores |
title_full | Platon: identification and characterization of bacterial plasmid contigs in short-read draft assemblies exploiting protein sequence-based replicon distribution scores |
title_fullStr | Platon: identification and characterization of bacterial plasmid contigs in short-read draft assemblies exploiting protein sequence-based replicon distribution scores |
title_full_unstemmed | Platon: identification and characterization of bacterial plasmid contigs in short-read draft assemblies exploiting protein sequence-based replicon distribution scores |
title_short | Platon: identification and characterization of bacterial plasmid contigs in short-read draft assemblies exploiting protein sequence-based replicon distribution scores |
title_sort | platon: identification and characterization of bacterial plasmid contigs in short-read draft assemblies exploiting protein sequence-based replicon distribution scores |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7660248/ https://www.ncbi.nlm.nih.gov/pubmed/32579097 http://dx.doi.org/10.1099/mgen.0.000398 |
work_keys_str_mv | AT schwengersoliver platonidentificationandcharacterizationofbacterialplasmidcontigsinshortreaddraftassembliesexploitingproteinsequencebasedreplicondistributionscores AT barthpatrick platonidentificationandcharacterizationofbacterialplasmidcontigsinshortreaddraftassembliesexploitingproteinsequencebasedreplicondistributionscores AT falgenhauerlinda platonidentificationandcharacterizationofbacterialplasmidcontigsinshortreaddraftassembliesexploitingproteinsequencebasedreplicondistributionscores AT haintorsten platonidentificationandcharacterizationofbacterialplasmidcontigsinshortreaddraftassembliesexploitingproteinsequencebasedreplicondistributionscores AT chakrabortytrinad platonidentificationandcharacterizationofbacterialplasmidcontigsinshortreaddraftassembliesexploitingproteinsequencebasedreplicondistributionscores AT goesmannalexander platonidentificationandcharacterizationofbacterialplasmidcontigsinshortreaddraftassembliesexploitingproteinsequencebasedreplicondistributionscores |