Cargando…
Using Graph Databases
Data in HEP are usually stored in tuples (tables), trees, nested tuples (trees of tuples) or relational (SQL-like) databases, with or without defined schema. But many of our data are graph-like and schema-less. They consist of entities with relations, some of which are known in advance, but many are...
Autor principal: | |
---|---|
Lenguaje: | eng |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://dx.doi.org/10.1051/epjconf/202024504004 http://cds.cern.ch/record/2712902 |
Sumario: | Data in HEP are usually stored in tuples (tables), trees, nested tuples (trees of tuples) or relational (SQL-like) databases, with or without defined schema. But many of our data are graph-like and schema-less. They consist of entities with relations, some of which are known in advance, but many are created ad-hoc, later. Such structures are not well covered by relational (SQL) databases. We don't need only a possibility to add new data with pre-defined relations. We need to add new relations. Graph databases exist since a long time. They have matured only recently thanks to Big Data and AI (adaptive NN). The are now very good implementations and de-facto standards available. The difference between SQL and Graph DB is similar as the difference between Fortran and C++. On one side, a rigid system, which can be very optimized. On the other side, a flexible dynamical system, which allows expressing of complex structures. GraphDB is a synthesis of OODB and SQLDB. They allow expressing web of objects without fragility of OO world. They capture only essential relations, they not keep a complete object dump. Migrating to Graphical database means moving structure from data to code, together with migration from imperative to declarative semantics (things don't happen, but exist). The paper describes basic principles of the Graph Database together with overview of existing standards and implementations. The usefulness and usability are demonstrated on the concrete example of the ATLAS Event Index in two approaches - as the full storage (all data are in the Graph Database) and meta-storage (a layer of schema-less graph-like data implemented on top of more traditional storage). The usability, the interfaces with the surrounding framework and the performance of those solution are be discussed. The possible more general usefulness for generic experiments' storage is also discussed. |
---|