Cargando…

A Cardinality Estimator in Complex Database Systems Based on TreeLSTM

Cardinality estimation is critical for database management systems (DBMSs) to execute query optimization tasks, which can guide the query optimizer in choosing the best execution plan. However, traditional cardinality estimation methods cannot provide accurate estimates because they cannot accuratel...

Descripción completa

Detalles Bibliográficos
Autores principales: Qi, Kaiyang, Yu, Jiong, He, Zhenzhen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10490213/
https://www.ncbi.nlm.nih.gov/pubmed/37687820
http://dx.doi.org/10.3390/s23177364
_version_ 1785103791848161280
author Qi, Kaiyang
Yu, Jiong
He, Zhenzhen
author_facet Qi, Kaiyang
Yu, Jiong
He, Zhenzhen
author_sort Qi, Kaiyang
collection PubMed
description Cardinality estimation is critical for database management systems (DBMSs) to execute query optimization tasks, which can guide the query optimizer in choosing the best execution plan. However, traditional cardinality estimation methods cannot provide accurate estimates because they cannot accurately capture the correlation between multiple tables. Several recent studies have revealed that learning-based cardinality estimation methods can address the shortcomings of traditional methods and provide more accurate estimates. However, the learning-based cardinality estimation methods still have large errors when an SQL query involves multiple tables or is very complex. To address this problem, we propose a sampling-based tree long short-term memory (TreeLSTM) neural network to model queries. The proposed model addresses the weakness of traditional methods when no sampled tuples match the predicates and considers the join relationship between multiple tables and the conjunction and disjunction operations between predicates. We construct subexpressions as trees using operator types between predicates and improve the performance and accuracy of cardinality estimation by capturing the join-crossing correlations between tables and the order dependencies between predicates. In addition, we construct a new loss function to overcome the drawback that Q-error cannot distinguish between large and small cardinalities. Extensive experimental results from real-world datasets show that our proposed model improves the estimation quality and outperforms traditional cardinality estimation methods and the other compared deep learning methods in three evaluation metrics: Q-error, MAE, and SMAPE.
format Online
Article
Text
id pubmed-10490213
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-104902132023-09-09 A Cardinality Estimator in Complex Database Systems Based on TreeLSTM Qi, Kaiyang Yu, Jiong He, Zhenzhen Sensors (Basel) Article Cardinality estimation is critical for database management systems (DBMSs) to execute query optimization tasks, which can guide the query optimizer in choosing the best execution plan. However, traditional cardinality estimation methods cannot provide accurate estimates because they cannot accurately capture the correlation between multiple tables. Several recent studies have revealed that learning-based cardinality estimation methods can address the shortcomings of traditional methods and provide more accurate estimates. However, the learning-based cardinality estimation methods still have large errors when an SQL query involves multiple tables or is very complex. To address this problem, we propose a sampling-based tree long short-term memory (TreeLSTM) neural network to model queries. The proposed model addresses the weakness of traditional methods when no sampled tuples match the predicates and considers the join relationship between multiple tables and the conjunction and disjunction operations between predicates. We construct subexpressions as trees using operator types between predicates and improve the performance and accuracy of cardinality estimation by capturing the join-crossing correlations between tables and the order dependencies between predicates. In addition, we construct a new loss function to overcome the drawback that Q-error cannot distinguish between large and small cardinalities. Extensive experimental results from real-world datasets show that our proposed model improves the estimation quality and outperforms traditional cardinality estimation methods and the other compared deep learning methods in three evaluation metrics: Q-error, MAE, and SMAPE. MDPI 2023-08-23 /pmc/articles/PMC10490213/ /pubmed/37687820 http://dx.doi.org/10.3390/s23177364 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Qi, Kaiyang
Yu, Jiong
He, Zhenzhen
A Cardinality Estimator in Complex Database Systems Based on TreeLSTM
title A Cardinality Estimator in Complex Database Systems Based on TreeLSTM
title_full A Cardinality Estimator in Complex Database Systems Based on TreeLSTM
title_fullStr A Cardinality Estimator in Complex Database Systems Based on TreeLSTM
title_full_unstemmed A Cardinality Estimator in Complex Database Systems Based on TreeLSTM
title_short A Cardinality Estimator in Complex Database Systems Based on TreeLSTM
title_sort cardinality estimator in complex database systems based on treelstm
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10490213/
https://www.ncbi.nlm.nih.gov/pubmed/37687820
http://dx.doi.org/10.3390/s23177364
work_keys_str_mv AT qikaiyang acardinalityestimatorincomplexdatabasesystemsbasedontreelstm
AT yujiong acardinalityestimatorincomplexdatabasesystemsbasedontreelstm
AT hezhenzhen acardinalityestimatorincomplexdatabasesystemsbasedontreelstm
AT qikaiyang cardinalityestimatorincomplexdatabasesystemsbasedontreelstm
AT yujiong cardinalityestimatorincomplexdatabasesystemsbasedontreelstm
AT hezhenzhen cardinalityestimatorincomplexdatabasesystemsbasedontreelstm