Cargando…

Spfy: an integrated graph database for real-time prediction of bacterial phenotypes and downstream comparative analyses

Public health laboratories are currently moving to whole-genome sequence (WGS)-based analyses, and require the rapid prediction of standard reference laboratory methods based solely on genomic data. Currently, these predictive genomics tasks rely on workflows that chain together multiple programs fo...

Descripción completa

Detalles Bibliográficos
Autores principales:	Le, Kevin K, Whiteside, Matthew D, Hopkins, James E, Gannon, Victor P J, Laing, Chad R
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2018
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6146121/ https://www.ncbi.nlm.nih.gov/pubmed/30212910 http://dx.doi.org/10.1093/database/bay086

_version_	1783356344591450112
author	Le, Kevin K Whiteside, Matthew D Hopkins, James E Gannon, Victor P J Laing, Chad R
author_facet	Le, Kevin K Whiteside, Matthew D Hopkins, James E Gannon, Victor P J Laing, Chad R
author_sort	Le, Kevin K
collection	PubMed
description	Public health laboratories are currently moving to whole-genome sequence (WGS)-based analyses, and require the rapid prediction of standard reference laboratory methods based solely on genomic data. Currently, these predictive genomics tasks rely on workflows that chain together multiple programs for the requisite analyses. While useful, these systems do not store the analyses in a genome-centric way, meaning the same analyses are often re-computed for the same genomes. To solve this problem, we created Spfy, a platform that rapidly performs the common reference laboratory tests, uses a graph database to store and retrieve the results from the computational workflows and links data to individual genomes using standardized ontologies. The Spfy platform facilitates rapid phenotype identification, as well as the efficient storage and downstream comparative analysis of tens of thousands of genome sequences. Though generally applicable to bacterial genome sequences, Spfy currently contains 10 243 Escherichia coli genomes, for which in-silico serotype and Shiga-toxin subtype, as well as the presence of known virulence factors and antimicrobial resistance determinants have been computed. Additionally, the presence/absence of the entire E. coli pan-genome was computed and linked to each genome. Owing to its database of diverse pre-computed results, and the ability to easily incorporate user data, Spfy facilitates hypothesis testing in fields ranging from population genomics to epidemiology, while mitigating the re-computation of analyses. The graph approach of Spfy is flexible, and can accommodate new analysis software modules as they are developed, easily linking new results to those already stored. Spfy provides a database and analyses approach for E. coli that is able to match the rapid accumulation of WGS data in public databases.
format	Online Article Text
id	pubmed-6146121
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-61461212018-09-25 Spfy: an integrated graph database for real-time prediction of bacterial phenotypes and downstream comparative analyses Le, Kevin K Whiteside, Matthew D Hopkins, James E Gannon, Victor P J Laing, Chad R Database (Oxford) Original Article Public health laboratories are currently moving to whole-genome sequence (WGS)-based analyses, and require the rapid prediction of standard reference laboratory methods based solely on genomic data. Currently, these predictive genomics tasks rely on workflows that chain together multiple programs for the requisite analyses. While useful, these systems do not store the analyses in a genome-centric way, meaning the same analyses are often re-computed for the same genomes. To solve this problem, we created Spfy, a platform that rapidly performs the common reference laboratory tests, uses a graph database to store and retrieve the results from the computational workflows and links data to individual genomes using standardized ontologies. The Spfy platform facilitates rapid phenotype identification, as well as the efficient storage and downstream comparative analysis of tens of thousands of genome sequences. Though generally applicable to bacterial genome sequences, Spfy currently contains 10 243 Escherichia coli genomes, for which in-silico serotype and Shiga-toxin subtype, as well as the presence of known virulence factors and antimicrobial resistance determinants have been computed. Additionally, the presence/absence of the entire E. coli pan-genome was computed and linked to each genome. Owing to its database of diverse pre-computed results, and the ability to easily incorporate user data, Spfy facilitates hypothesis testing in fields ranging from population genomics to epidemiology, while mitigating the re-computation of analyses. The graph approach of Spfy is flexible, and can accommodate new analysis software modules as they are developed, easily linking new results to those already stored. Spfy provides a database and analyses approach for E. coli that is able to match the rapid accumulation of WGS data in public databases. Oxford University Press 2018-09-13 /pmc/articles/PMC6146121/ /pubmed/30212910 http://dx.doi.org/10.1093/database/bay086 Text en © Crown copyright 2018. This article contains public sector information licensed under the Open Government Licence v3.0 (http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/).
spellingShingle	Original Article Le, Kevin K Whiteside, Matthew D Hopkins, James E Gannon, Victor P J Laing, Chad R Spfy: an integrated graph database for real-time prediction of bacterial phenotypes and downstream comparative analyses
title	Spfy: an integrated graph database for real-time prediction of bacterial phenotypes and downstream comparative analyses
title_full	Spfy: an integrated graph database for real-time prediction of bacterial phenotypes and downstream comparative analyses
title_fullStr	Spfy: an integrated graph database for real-time prediction of bacterial phenotypes and downstream comparative analyses
title_full_unstemmed	Spfy: an integrated graph database for real-time prediction of bacterial phenotypes and downstream comparative analyses
title_short	Spfy: an integrated graph database for real-time prediction of bacterial phenotypes and downstream comparative analyses
title_sort	spfy: an integrated graph database for real-time prediction of bacterial phenotypes and downstream comparative analyses
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6146121/ https://www.ncbi.nlm.nih.gov/pubmed/30212910 http://dx.doi.org/10.1093/database/bay086
work_keys_str_mv	AT lekevink spfyanintegratedgraphdatabaseforrealtimepredictionofbacterialphenotypesanddownstreamcomparativeanalyses AT whitesidematthewd spfyanintegratedgraphdatabaseforrealtimepredictionofbacterialphenotypesanddownstreamcomparativeanalyses AT hopkinsjamese spfyanintegratedgraphdatabaseforrealtimepredictionofbacterialphenotypesanddownstreamcomparativeanalyses AT gannonvictorpj spfyanintegratedgraphdatabaseforrealtimepredictionofbacterialphenotypesanddownstreamcomparativeanalyses AT laingchadr spfyanintegratedgraphdatabaseforrealtimepredictionofbacterialphenotypesanddownstreamcomparativeanalyses

Spfy: an integrated graph database for real-time prediction of bacterial phenotypes and downstream comparative analyses

Ejemplares similares