Cargando…

An adaptive spark-based framework for querying large-scale NoSQL and relational databases

The growing popularity of big data analysis and cloud computing has created new big data management standards. Sometimes, programmers may interact with a number of heterogeneous data stores depending on the information they are responsible for: SQL and NoSQL data stores. Interacting with heterogeneo...

Descripción completa

Detalles Bibliográficos
Autores principales: Khashan, Eman, Eldesouky, Ali, Elghamrawy, Sally
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8376024/
https://www.ncbi.nlm.nih.gov/pubmed/34411131
http://dx.doi.org/10.1371/journal.pone.0255562
_version_ 1783740422681526272
author Khashan, Eman
Eldesouky, Ali
Elghamrawy, Sally
author_facet Khashan, Eman
Eldesouky, Ali
Elghamrawy, Sally
author_sort Khashan, Eman
collection PubMed
description The growing popularity of big data analysis and cloud computing has created new big data management standards. Sometimes, programmers may interact with a number of heterogeneous data stores depending on the information they are responsible for: SQL and NoSQL data stores. Interacting with heterogeneous data models via numerous APIs and query languages imposes challenging tasks on multi-data processing developers. Indeed, complex queries concerning homogenous data structures cannot currently be performed in a declarative manner when found in single data storage applications and therefore require additional development efforts. Many models were presented in order to address complex queries Via multistore applications. Some of these models implemented a complex unified and fast model, while others’ efficiency is not good enough to solve this type of complex database queries. This paper provides an automated, fast and easy unified architecture to solve simple and complex SQL and NoSQL queries over heterogeneous data stores (CQNS). This proposed framework can be used in cloud environments or for any big data application to automatically help developers to manage basic and complicated database queries. CQNS consists of three layers: matching selector layer, processing layer, and query execution layer. The matching selector layer is the heart of this architecture in which five of the user queries are examined if they are matched with another five queries stored in a single engine stored in the architecture library. This is achieved through a proposed algorithm that directs the query to the right SQL or NoSQL database engine. Furthermore, CQNS deal with many NoSQL Databases like MongoDB, Cassandra, Riak, CouchDB, and NOE4J databases. This paper presents a spark framework that can handle both SQL and NoSQL Databases. Four scenarios’ benchmarks datasets are used to evaluate the proposed CQNS for querying different NoSQL Databases in terms of optimization process performance and query execution time. The results show that, the CQNS achieves best latency and throughput in less time among the compared systems.
format Online
Article
Text
id pubmed-8376024
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-83760242021-08-20 An adaptive spark-based framework for querying large-scale NoSQL and relational databases Khashan, Eman Eldesouky, Ali Elghamrawy, Sally PLoS One Research Article The growing popularity of big data analysis and cloud computing has created new big data management standards. Sometimes, programmers may interact with a number of heterogeneous data stores depending on the information they are responsible for: SQL and NoSQL data stores. Interacting with heterogeneous data models via numerous APIs and query languages imposes challenging tasks on multi-data processing developers. Indeed, complex queries concerning homogenous data structures cannot currently be performed in a declarative manner when found in single data storage applications and therefore require additional development efforts. Many models were presented in order to address complex queries Via multistore applications. Some of these models implemented a complex unified and fast model, while others’ efficiency is not good enough to solve this type of complex database queries. This paper provides an automated, fast and easy unified architecture to solve simple and complex SQL and NoSQL queries over heterogeneous data stores (CQNS). This proposed framework can be used in cloud environments or for any big data application to automatically help developers to manage basic and complicated database queries. CQNS consists of three layers: matching selector layer, processing layer, and query execution layer. The matching selector layer is the heart of this architecture in which five of the user queries are examined if they are matched with another five queries stored in a single engine stored in the architecture library. This is achieved through a proposed algorithm that directs the query to the right SQL or NoSQL database engine. Furthermore, CQNS deal with many NoSQL Databases like MongoDB, Cassandra, Riak, CouchDB, and NOE4J databases. This paper presents a spark framework that can handle both SQL and NoSQL Databases. Four scenarios’ benchmarks datasets are used to evaluate the proposed CQNS for querying different NoSQL Databases in terms of optimization process performance and query execution time. The results show that, the CQNS achieves best latency and throughput in less time among the compared systems. Public Library of Science 2021-08-19 /pmc/articles/PMC8376024/ /pubmed/34411131 http://dx.doi.org/10.1371/journal.pone.0255562 Text en © 2021 Khashan et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Khashan, Eman
Eldesouky, Ali
Elghamrawy, Sally
An adaptive spark-based framework for querying large-scale NoSQL and relational databases
title An adaptive spark-based framework for querying large-scale NoSQL and relational databases
title_full An adaptive spark-based framework for querying large-scale NoSQL and relational databases
title_fullStr An adaptive spark-based framework for querying large-scale NoSQL and relational databases
title_full_unstemmed An adaptive spark-based framework for querying large-scale NoSQL and relational databases
title_short An adaptive spark-based framework for querying large-scale NoSQL and relational databases
title_sort adaptive spark-based framework for querying large-scale nosql and relational databases
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8376024/
https://www.ncbi.nlm.nih.gov/pubmed/34411131
http://dx.doi.org/10.1371/journal.pone.0255562
work_keys_str_mv AT khashaneman anadaptivesparkbasedframeworkforqueryinglargescalenosqlandrelationaldatabases
AT eldesoukyali anadaptivesparkbasedframeworkforqueryinglargescalenosqlandrelationaldatabases
AT elghamrawysally anadaptivesparkbasedframeworkforqueryinglargescalenosqlandrelationaldatabases
AT khashaneman adaptivesparkbasedframeworkforqueryinglargescalenosqlandrelationaldatabases
AT eldesoukyali adaptivesparkbasedframeworkforqueryinglargescalenosqlandrelationaldatabases
AT elghamrawysally adaptivesparkbasedframeworkforqueryinglargescalenosqlandrelationaldatabases