Cargando…

VirSorter: mining viral signal from microbial genomic data

Viruses of microbes impact all ecosystems where microbes drive key energy and substrate transformations including the oceans, humans and industrial fermenters. However, despite this recognized importance, our understanding of viral diversity and impacts remains limited by too few model systems and r...

Descripción completa

Detalles Bibliográficos
Autores principales: Roux, Simon, Enault, Francois, Hurwitz, Bonnie L., Sullivan, Matthew B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4451026/
https://www.ncbi.nlm.nih.gov/pubmed/26038737
http://dx.doi.org/10.7717/peerj.985
_version_ 1782374087147913216
author Roux, Simon
Enault, Francois
Hurwitz, Bonnie L.
Sullivan, Matthew B.
author_facet Roux, Simon
Enault, Francois
Hurwitz, Bonnie L.
Sullivan, Matthew B.
author_sort Roux, Simon
collection PubMed
description Viruses of microbes impact all ecosystems where microbes drive key energy and substrate transformations including the oceans, humans and industrial fermenters. However, despite this recognized importance, our understanding of viral diversity and impacts remains limited by too few model systems and reference genomes. One way to fill these gaps in our knowledge of viral diversity is through the detection of viral signal in microbial genomic data. While multiple approaches have been developed and applied for the detection of prophages (viral genomes integrated in a microbial genome), new types of microbial genomic data are emerging that are more fragmented and larger scale, such as Single-cell Amplified Genomes (SAGs) of uncultivated organisms or genomic fragments assembled from metagenomic sequencing. Here, we present VirSorter, a tool designed to detect viral signal in these different types of microbial sequence data in both a reference-dependent and reference-independent manner, leveraging probabilistic models and extensive virome data to maximize detection of novel viruses. Performance testing shows that VirSorter’s prophage prediction capability compares to that of available prophage predictors for complete genomes, but is superior in predicting viral sequences outside of a host genome (i.e., from extrachromosomal prophages, lytic infections, or partially assembled prophages). Furthermore, VirSorter outperforms existing tools for fragmented genomic and metagenomic datasets, and can identify viral signal in assembled sequence (contigs) as short as 3kb, while providing near-perfect identification (>95% Recall and 100% Precision) on contigs of at least 10kb. Because VirSorter scales to large datasets, it can also be used in “reverse” to more confidently identify viral sequence in viral metagenomes by sorting away cellular DNA whether derived from gene transfer agents, generalized transduction or contamination. Finally, VirSorter is made available through the iPlant Cyberinfrastructure that provides a web-based user interface interconnected with the required computing resources. VirSorter thus complements existing prophage prediction softwares to better leverage fragmented, SAG and metagenomic datasets in a way that will scale to modern sequencing. Given these features, VirSorter should enable the discovery of new viruses in microbial datasets, and further our understanding of uncultivated viral communities across diverse ecosystems.
format Online
Article
Text
id pubmed-4451026
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-44510262015-06-02 VirSorter: mining viral signal from microbial genomic data Roux, Simon Enault, Francois Hurwitz, Bonnie L. Sullivan, Matthew B. PeerJ Bioinformatics Viruses of microbes impact all ecosystems where microbes drive key energy and substrate transformations including the oceans, humans and industrial fermenters. However, despite this recognized importance, our understanding of viral diversity and impacts remains limited by too few model systems and reference genomes. One way to fill these gaps in our knowledge of viral diversity is through the detection of viral signal in microbial genomic data. While multiple approaches have been developed and applied for the detection of prophages (viral genomes integrated in a microbial genome), new types of microbial genomic data are emerging that are more fragmented and larger scale, such as Single-cell Amplified Genomes (SAGs) of uncultivated organisms or genomic fragments assembled from metagenomic sequencing. Here, we present VirSorter, a tool designed to detect viral signal in these different types of microbial sequence data in both a reference-dependent and reference-independent manner, leveraging probabilistic models and extensive virome data to maximize detection of novel viruses. Performance testing shows that VirSorter’s prophage prediction capability compares to that of available prophage predictors for complete genomes, but is superior in predicting viral sequences outside of a host genome (i.e., from extrachromosomal prophages, lytic infections, or partially assembled prophages). Furthermore, VirSorter outperforms existing tools for fragmented genomic and metagenomic datasets, and can identify viral signal in assembled sequence (contigs) as short as 3kb, while providing near-perfect identification (>95% Recall and 100% Precision) on contigs of at least 10kb. Because VirSorter scales to large datasets, it can also be used in “reverse” to more confidently identify viral sequence in viral metagenomes by sorting away cellular DNA whether derived from gene transfer agents, generalized transduction or contamination. Finally, VirSorter is made available through the iPlant Cyberinfrastructure that provides a web-based user interface interconnected with the required computing resources. VirSorter thus complements existing prophage prediction softwares to better leverage fragmented, SAG and metagenomic datasets in a way that will scale to modern sequencing. Given these features, VirSorter should enable the discovery of new viruses in microbial datasets, and further our understanding of uncultivated viral communities across diverse ecosystems. PeerJ Inc. 2015-05-28 /pmc/articles/PMC4451026/ /pubmed/26038737 http://dx.doi.org/10.7717/peerj.985 Text en © 2015 Roux et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Roux, Simon
Enault, Francois
Hurwitz, Bonnie L.
Sullivan, Matthew B.
VirSorter: mining viral signal from microbial genomic data
title VirSorter: mining viral signal from microbial genomic data
title_full VirSorter: mining viral signal from microbial genomic data
title_fullStr VirSorter: mining viral signal from microbial genomic data
title_full_unstemmed VirSorter: mining viral signal from microbial genomic data
title_short VirSorter: mining viral signal from microbial genomic data
title_sort virsorter: mining viral signal from microbial genomic data
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4451026/
https://www.ncbi.nlm.nih.gov/pubmed/26038737
http://dx.doi.org/10.7717/peerj.985
work_keys_str_mv AT rouxsimon virsorterminingviralsignalfrommicrobialgenomicdata
AT enaultfrancois virsorterminingviralsignalfrommicrobialgenomicdata
AT hurwitzbonniel virsorterminingviralsignalfrommicrobialgenomicdata
AT sullivanmatthewb virsorterminingviralsignalfrommicrobialgenomicdata