Cargando…

From Trees to Clouds: PhageClouds for Fast Comparison of ∼640,000 Phage Genomic Sequences and Host-Centric Visualization Using Genomic Network Graphs

Background: Fast and computationally efficient strategies are required to explore genomic relationships within an increasingly large and diverse phage sequence space. Here, we present PhageClouds, a novel approach using a graph database of phage genomic sequences and their intergenomic distances to...

Descripción completa

Detalles Bibliográficos
Autores principales: Rangel-Pineros, Guillermo, Millard, Andrew, Michniewski, Slawomir, Scanlan, David, Sirén, Kimmo, Reyes, Alejandro, Petersen, Bent, Clokie, Martha R.J., Sicheritz-Pontén, Thomas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Mary Ann Liebert, Inc., publishers 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9041511/
https://www.ncbi.nlm.nih.gov/pubmed/36147515
http://dx.doi.org/10.1089/phage.2021.0008
_version_ 1784694544019750912
author Rangel-Pineros, Guillermo
Millard, Andrew
Michniewski, Slawomir
Scanlan, David
Sirén, Kimmo
Reyes, Alejandro
Petersen, Bent
Clokie, Martha R.J.
Sicheritz-Pontén, Thomas
author_facet Rangel-Pineros, Guillermo
Millard, Andrew
Michniewski, Slawomir
Scanlan, David
Sirén, Kimmo
Reyes, Alejandro
Petersen, Bent
Clokie, Martha R.J.
Sicheritz-Pontén, Thomas
author_sort Rangel-Pineros, Guillermo
collection PubMed
description Background: Fast and computationally efficient strategies are required to explore genomic relationships within an increasingly large and diverse phage sequence space. Here, we present PhageClouds, a novel approach using a graph database of phage genomic sequences and their intergenomic distances to explore the phage genomic sequence space. Methods: A total of 640,000 phage genomic sequences were retrieved from a variety of databases and public virome assemblies. Intergenomic distances were calculated with dashing, an alignment-free method suitable for handling massive data sets. These data were used to build a Neo4j(®) graph database. Results: PhageClouds supported the search of related phages among all complete phage genomes from GenBank for a single query phage in just 10 s. Moreover, PhageClouds expanded the number of closely related phage sequences detected for both finished and draft phage genomes, in comparison with searches exclusively targeting phage entries from GenBank. Conclusions: PhageClouds is a novel resource that will facilitate the analysis of phage genomic sequences and the characterization of assembled phage genomes.
format Online
Article
Text
id pubmed-9041511
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Mary Ann Liebert, Inc., publishers
record_format MEDLINE/PubMed
spelling pubmed-90415112022-09-21 From Trees to Clouds: PhageClouds for Fast Comparison of ∼640,000 Phage Genomic Sequences and Host-Centric Visualization Using Genomic Network Graphs Rangel-Pineros, Guillermo Millard, Andrew Michniewski, Slawomir Scanlan, David Sirén, Kimmo Reyes, Alejandro Petersen, Bent Clokie, Martha R.J. Sicheritz-Pontén, Thomas Phage (New Rochelle) Original Articles Background: Fast and computationally efficient strategies are required to explore genomic relationships within an increasingly large and diverse phage sequence space. Here, we present PhageClouds, a novel approach using a graph database of phage genomic sequences and their intergenomic distances to explore the phage genomic sequence space. Methods: A total of 640,000 phage genomic sequences were retrieved from a variety of databases and public virome assemblies. Intergenomic distances were calculated with dashing, an alignment-free method suitable for handling massive data sets. These data were used to build a Neo4j(®) graph database. Results: PhageClouds supported the search of related phages among all complete phage genomes from GenBank for a single query phage in just 10 s. Moreover, PhageClouds expanded the number of closely related phage sequences detected for both finished and draft phage genomes, in comparison with searches exclusively targeting phage entries from GenBank. Conclusions: PhageClouds is a novel resource that will facilitate the analysis of phage genomic sequences and the characterization of assembled phage genomes. Mary Ann Liebert, Inc., publishers 2021-12-01 2021-12-16 /pmc/articles/PMC9041511/ /pubmed/36147515 http://dx.doi.org/10.1089/phage.2021.0008 Text en © Guillermo Rangel-Pineros et al. 2021; Published by Mary Ann Liebert, Inc. https://creativecommons.org/licenses/by-nc/4.0/This Open Access article is distributed under the terms of the Creative Commons Attribution Noncommercial License [CC-BY-NC] (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and the source are cited.
spellingShingle Original Articles
Rangel-Pineros, Guillermo
Millard, Andrew
Michniewski, Slawomir
Scanlan, David
Sirén, Kimmo
Reyes, Alejandro
Petersen, Bent
Clokie, Martha R.J.
Sicheritz-Pontén, Thomas
From Trees to Clouds: PhageClouds for Fast Comparison of ∼640,000 Phage Genomic Sequences and Host-Centric Visualization Using Genomic Network Graphs
title From Trees to Clouds: PhageClouds for Fast Comparison of ∼640,000 Phage Genomic Sequences and Host-Centric Visualization Using Genomic Network Graphs
title_full From Trees to Clouds: PhageClouds for Fast Comparison of ∼640,000 Phage Genomic Sequences and Host-Centric Visualization Using Genomic Network Graphs
title_fullStr From Trees to Clouds: PhageClouds for Fast Comparison of ∼640,000 Phage Genomic Sequences and Host-Centric Visualization Using Genomic Network Graphs
title_full_unstemmed From Trees to Clouds: PhageClouds for Fast Comparison of ∼640,000 Phage Genomic Sequences and Host-Centric Visualization Using Genomic Network Graphs
title_short From Trees to Clouds: PhageClouds for Fast Comparison of ∼640,000 Phage Genomic Sequences and Host-Centric Visualization Using Genomic Network Graphs
title_sort from trees to clouds: phageclouds for fast comparison of ∼640,000 phage genomic sequences and host-centric visualization using genomic network graphs
topic Original Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9041511/
https://www.ncbi.nlm.nih.gov/pubmed/36147515
http://dx.doi.org/10.1089/phage.2021.0008
work_keys_str_mv AT rangelpinerosguillermo fromtreestocloudsphagecloudsforfastcomparisonof640000phagegenomicsequencesandhostcentricvisualizationusinggenomicnetworkgraphs
AT millardandrew fromtreestocloudsphagecloudsforfastcomparisonof640000phagegenomicsequencesandhostcentricvisualizationusinggenomicnetworkgraphs
AT michniewskislawomir fromtreestocloudsphagecloudsforfastcomparisonof640000phagegenomicsequencesandhostcentricvisualizationusinggenomicnetworkgraphs
AT scanlandavid fromtreestocloudsphagecloudsforfastcomparisonof640000phagegenomicsequencesandhostcentricvisualizationusinggenomicnetworkgraphs
AT sirenkimmo fromtreestocloudsphagecloudsforfastcomparisonof640000phagegenomicsequencesandhostcentricvisualizationusinggenomicnetworkgraphs
AT reyesalejandro fromtreestocloudsphagecloudsforfastcomparisonof640000phagegenomicsequencesandhostcentricvisualizationusinggenomicnetworkgraphs
AT petersenbent fromtreestocloudsphagecloudsforfastcomparisonof640000phagegenomicsequencesandhostcentricvisualizationusinggenomicnetworkgraphs
AT clokiemartharj fromtreestocloudsphagecloudsforfastcomparisonof640000phagegenomicsequencesandhostcentricvisualizationusinggenomicnetworkgraphs
AT sicheritzpontenthomas fromtreestocloudsphagecloudsforfastcomparisonof640000phagegenomicsequencesandhostcentricvisualizationusinggenomicnetworkgraphs