Cargando…

Property Graph vs RDF Triple Store: A Comparison on Glycan Substructure Search

Resource description framework (RDF) and Property Graph databases are emerging technologies that are used for storing graph-structured data. We compare these technologies through a molecular biology use case: glycan substructure search. Glycans are branched tree-like molecules composed of building b...

Descripción completa

Detalles Bibliográficos
Autores principales: Alocci, Davide, Mariethoz, Julien, Horlacher, Oliver, Bolleman, Jerven T., Campbell, Matthew P., Lisacek, Frederique
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4684231/
https://www.ncbi.nlm.nih.gov/pubmed/26656740
http://dx.doi.org/10.1371/journal.pone.0144578
_version_ 1782406153820438528
author Alocci, Davide
Mariethoz, Julien
Horlacher, Oliver
Bolleman, Jerven T.
Campbell, Matthew P.
Lisacek, Frederique
author_facet Alocci, Davide
Mariethoz, Julien
Horlacher, Oliver
Bolleman, Jerven T.
Campbell, Matthew P.
Lisacek, Frederique
author_sort Alocci, Davide
collection PubMed
description Resource description framework (RDF) and Property Graph databases are emerging technologies that are used for storing graph-structured data. We compare these technologies through a molecular biology use case: glycan substructure search. Glycans are branched tree-like molecules composed of building blocks linked together by chemical bonds. The molecular structure of a glycan can be encoded into a direct acyclic graph where each node represents a building block and each edge serves as a chemical linkage between two building blocks. In this context, Graph databases are possible software solutions for storing glycan structures and Graph query languages, such as SPARQL and Cypher, can be used to perform a substructure search. Glycan substructure searching is an important feature for querying structure and experimental glycan databases and retrieving biologically meaningful data. This applies for example to identifying a region of the glycan recognised by a glycan binding protein (GBP). In this study, 19,404 glycan structures were selected from GlycomeDB (www.glycome-db.org) and modelled for being stored into a RDF triple store and a Property Graph. We then performed two different sets of searches and compared the query response times and the results from both technologies to assess performance and accuracy. The two implementations produced the same results, but interestingly we noted a difference in the query response times. Qualitative measures such as portability were also used to define further criteria for choosing the technology adapted to solving glycan substructure search and other comparable issues.
format Online
Article
Text
id pubmed-4684231
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-46842312015-12-31 Property Graph vs RDF Triple Store: A Comparison on Glycan Substructure Search Alocci, Davide Mariethoz, Julien Horlacher, Oliver Bolleman, Jerven T. Campbell, Matthew P. Lisacek, Frederique PLoS One Research Article Resource description framework (RDF) and Property Graph databases are emerging technologies that are used for storing graph-structured data. We compare these technologies through a molecular biology use case: glycan substructure search. Glycans are branched tree-like molecules composed of building blocks linked together by chemical bonds. The molecular structure of a glycan can be encoded into a direct acyclic graph where each node represents a building block and each edge serves as a chemical linkage between two building blocks. In this context, Graph databases are possible software solutions for storing glycan structures and Graph query languages, such as SPARQL and Cypher, can be used to perform a substructure search. Glycan substructure searching is an important feature for querying structure and experimental glycan databases and retrieving biologically meaningful data. This applies for example to identifying a region of the glycan recognised by a glycan binding protein (GBP). In this study, 19,404 glycan structures were selected from GlycomeDB (www.glycome-db.org) and modelled for being stored into a RDF triple store and a Property Graph. We then performed two different sets of searches and compared the query response times and the results from both technologies to assess performance and accuracy. The two implementations produced the same results, but interestingly we noted a difference in the query response times. Qualitative measures such as portability were also used to define further criteria for choosing the technology adapted to solving glycan substructure search and other comparable issues. Public Library of Science 2015-12-14 /pmc/articles/PMC4684231/ /pubmed/26656740 http://dx.doi.org/10.1371/journal.pone.0144578 Text en © 2015 Alocci et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Alocci, Davide
Mariethoz, Julien
Horlacher, Oliver
Bolleman, Jerven T.
Campbell, Matthew P.
Lisacek, Frederique
Property Graph vs RDF Triple Store: A Comparison on Glycan Substructure Search
title Property Graph vs RDF Triple Store: A Comparison on Glycan Substructure Search
title_full Property Graph vs RDF Triple Store: A Comparison on Glycan Substructure Search
title_fullStr Property Graph vs RDF Triple Store: A Comparison on Glycan Substructure Search
title_full_unstemmed Property Graph vs RDF Triple Store: A Comparison on Glycan Substructure Search
title_short Property Graph vs RDF Triple Store: A Comparison on Glycan Substructure Search
title_sort property graph vs rdf triple store: a comparison on glycan substructure search
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4684231/
https://www.ncbi.nlm.nih.gov/pubmed/26656740
http://dx.doi.org/10.1371/journal.pone.0144578
work_keys_str_mv AT aloccidavide propertygraphvsrdftriplestoreacomparisononglycansubstructuresearch
AT mariethozjulien propertygraphvsrdftriplestoreacomparisononglycansubstructuresearch
AT horlacheroliver propertygraphvsrdftriplestoreacomparisononglycansubstructuresearch
AT bollemanjervent propertygraphvsrdftriplestoreacomparisononglycansubstructuresearch
AT campbellmatthewp propertygraphvsrdftriplestoreacomparisononglycansubstructuresearch
AT lisacekfrederique propertygraphvsrdftriplestoreacomparisononglycansubstructuresearch