Cargando…

On the (im)possibility of reconstructing plasmids from whole-genome short-read sequencing data

To benchmark algorithms for automated plasmid sequence reconstruction from short-read sequencing data, we selected 42 publicly available complete bacterial genome sequences spanning 12 genera, containing 148 plasmids. We predicted plasmids from short-read data with four programs (PlasmidSPAdes, Recy...

Descripción completa

Detalles Bibliográficos
Autores principales: Arredondo-Alonso, Sergio, Willems, Rob J., van Schaik, Willem, Schürch, Anita C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Microbiology Society 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5695206/
https://www.ncbi.nlm.nih.gov/pubmed/29177087
http://dx.doi.org/10.1099/mgen.0.000128
_version_ 1783280273737121792
author Arredondo-Alonso, Sergio
Willems, Rob J.
van Schaik, Willem
Schürch, Anita C.
author_facet Arredondo-Alonso, Sergio
Willems, Rob J.
van Schaik, Willem
Schürch, Anita C.
author_sort Arredondo-Alonso, Sergio
collection PubMed
description To benchmark algorithms for automated plasmid sequence reconstruction from short-read sequencing data, we selected 42 publicly available complete bacterial genome sequences spanning 12 genera, containing 148 plasmids. We predicted plasmids from short-read data with four programs (PlasmidSPAdes, Recycler, cBar and PlasmidFinder) and compared the outcome to the reference sequences. PlasmidSPAdes reconstructs plasmids based on coverage differences in the assembly graph. It reconstructed most of the reference plasmids (recall=0.82), but approximately a quarter of the predicted plasmid contigs were false positives (precision=0.75). PlasmidSPAdes merged 84 % of the predictions from genomes with multiple plasmids into a single bin. Recycler searches the assembly graph for sub-graphs corresponding to circular sequences and correctly predicted small plasmids, but failed with long plasmids (recall=0.12, precision=0.30). cBar, which applies pentamer frequency analysis to detect plasmid-derived contigs, showed a recall and precision of 0.76 and 0.62, respectively. However, cBar categorizes contigs as plasmid-derived and does not bin the different plasmids. PlasmidFinder, which searches for replicons, had the highest precision (1.0), but was restricted by the contents of its database and the contig length obtained from de novo assembly (recall=0.36). PlasmidSPAdes and Recycler detected putative small plasmids (<10 kbp), which were also predicted as plasmids by cBar, but were absent in the original assembly. This study shows that it is possible to automatically predict small plasmids. Prediction of large plasmids (>50 kbp) containing repeated sequences remains challenging and limits the high-throughput analysis of plasmids from short-read whole-genome sequencing data.
format Online
Article
Text
id pubmed-5695206
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Microbiology Society
record_format MEDLINE/PubMed
spelling pubmed-56952062017-11-24 On the (im)possibility of reconstructing plasmids from whole-genome short-read sequencing data Arredondo-Alonso, Sergio Willems, Rob J. van Schaik, Willem Schürch, Anita C. Microb Genom Methods Paper To benchmark algorithms for automated plasmid sequence reconstruction from short-read sequencing data, we selected 42 publicly available complete bacterial genome sequences spanning 12 genera, containing 148 plasmids. We predicted plasmids from short-read data with four programs (PlasmidSPAdes, Recycler, cBar and PlasmidFinder) and compared the outcome to the reference sequences. PlasmidSPAdes reconstructs plasmids based on coverage differences in the assembly graph. It reconstructed most of the reference plasmids (recall=0.82), but approximately a quarter of the predicted plasmid contigs were false positives (precision=0.75). PlasmidSPAdes merged 84 % of the predictions from genomes with multiple plasmids into a single bin. Recycler searches the assembly graph for sub-graphs corresponding to circular sequences and correctly predicted small plasmids, but failed with long plasmids (recall=0.12, precision=0.30). cBar, which applies pentamer frequency analysis to detect plasmid-derived contigs, showed a recall and precision of 0.76 and 0.62, respectively. However, cBar categorizes contigs as plasmid-derived and does not bin the different plasmids. PlasmidFinder, which searches for replicons, had the highest precision (1.0), but was restricted by the contents of its database and the contig length obtained from de novo assembly (recall=0.36). PlasmidSPAdes and Recycler detected putative small plasmids (<10 kbp), which were also predicted as plasmids by cBar, but were absent in the original assembly. This study shows that it is possible to automatically predict small plasmids. Prediction of large plasmids (>50 kbp) containing repeated sequences remains challenging and limits the high-throughput analysis of plasmids from short-read whole-genome sequencing data. Microbiology Society 2017-08-18 /pmc/articles/PMC5695206/ /pubmed/29177087 http://dx.doi.org/10.1099/mgen.0.000128 Text en © 2017 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.
spellingShingle Methods Paper
Arredondo-Alonso, Sergio
Willems, Rob J.
van Schaik, Willem
Schürch, Anita C.
On the (im)possibility of reconstructing plasmids from whole-genome short-read sequencing data
title On the (im)possibility of reconstructing plasmids from whole-genome short-read sequencing data
title_full On the (im)possibility of reconstructing plasmids from whole-genome short-read sequencing data
title_fullStr On the (im)possibility of reconstructing plasmids from whole-genome short-read sequencing data
title_full_unstemmed On the (im)possibility of reconstructing plasmids from whole-genome short-read sequencing data
title_short On the (im)possibility of reconstructing plasmids from whole-genome short-read sequencing data
title_sort on the (im)possibility of reconstructing plasmids from whole-genome short-read sequencing data
topic Methods Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5695206/
https://www.ncbi.nlm.nih.gov/pubmed/29177087
http://dx.doi.org/10.1099/mgen.0.000128
work_keys_str_mv AT arredondoalonsosergio ontheimpossibilityofreconstructingplasmidsfromwholegenomeshortreadsequencingdata
AT willemsrobj ontheimpossibilityofreconstructingplasmidsfromwholegenomeshortreadsequencingdata
AT vanschaikwillem ontheimpossibilityofreconstructingplasmidsfromwholegenomeshortreadsequencingdata
AT schurchanitac ontheimpossibilityofreconstructingplasmidsfromwholegenomeshortreadsequencingdata