Cargando…

Use of Graph Database for the Integration of Heterogeneous Biological Data

Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join s...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yoon, Byoung-Ha, Kim, Seon-Kyu, Kim, Seon-Young
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Korea Genome Organization 2017
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5389944/ https://www.ncbi.nlm.nih.gov/pubmed/28416946 http://dx.doi.org/10.5808/GI.2017.15.1.19

_version_	1782521358782038016
author	Yoon, Byoung-Ha Kim, Seon-Kyu Kim, Seon-Young
author_facet	Yoon, Byoung-Ha Kim, Seon-Kyu Kim, Seon-Young
author_sort	Yoon, Byoung-Ha
collection	PubMed
description	Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.
format	Online Article Text
id	pubmed-5389944
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Korea Genome Organization
record_format	MEDLINE/PubMed
spelling	pubmed-53899442017-04-17 Use of Graph Database for the Integration of Heterogeneous Biological Data Yoon, Byoung-Ha Kim, Seon-Kyu Kim, Seon-Young Genomics Inform Original Article Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data. Korea Genome Organization 2017-03 2017-03-29 /pmc/articles/PMC5389944/ /pubmed/28416946 http://dx.doi.org/10.5808/GI.2017.15.1.19 Text en Copyright © 2017 by the Korea Genome Organization http://creativecommons.org/licenses/by-nc/4.0/ It is identical to the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/).
spellingShingle	Original Article Yoon, Byoung-Ha Kim, Seon-Kyu Kim, Seon-Young Use of Graph Database for the Integration of Heterogeneous Biological Data
title	Use of Graph Database for the Integration of Heterogeneous Biological Data
title_full	Use of Graph Database for the Integration of Heterogeneous Biological Data
title_fullStr	Use of Graph Database for the Integration of Heterogeneous Biological Data
title_full_unstemmed	Use of Graph Database for the Integration of Heterogeneous Biological Data
title_short	Use of Graph Database for the Integration of Heterogeneous Biological Data
title_sort	use of graph database for the integration of heterogeneous biological data
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5389944/ https://www.ncbi.nlm.nih.gov/pubmed/28416946 http://dx.doi.org/10.5808/GI.2017.15.1.19
work_keys_str_mv	AT yoonbyoungha useofgraphdatabasefortheintegrationofheterogeneousbiologicaldata AT kimseonkyu useofgraphdatabasefortheintegrationofheterogeneousbiologicaldata AT kimseonyoung useofgraphdatabasefortheintegrationofheterogeneousbiologicaldata

Use of Graph Database for the Integration of Heterogeneous Biological Data

Ejemplares similares