Cargando…

Data structure set-trie for storing and querying sets: Theoretical and empirical analysis

Set containment operations form an important tool in various fields such as information retrieval, AI systems, object-relational databases, and Internet applications. In the paper, a set-trie data structure for storing sets is considered, along with the efficient algorithms for the corresponding set...

Descripción completa

Detalles Bibliográficos
Autores principales: Savnik, Iztok, Akulich, Mikita, Krnc, Matjaž, Škrekovski, Riste
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7875400/
https://www.ncbi.nlm.nih.gov/pubmed/33566827
http://dx.doi.org/10.1371/journal.pone.0245122
_version_ 1783649765884428288
author Savnik, Iztok
Akulich, Mikita
Krnc, Matjaž
Škrekovski, Riste
author_facet Savnik, Iztok
Akulich, Mikita
Krnc, Matjaž
Škrekovski, Riste
author_sort Savnik, Iztok
collection PubMed
description Set containment operations form an important tool in various fields such as information retrieval, AI systems, object-relational databases, and Internet applications. In the paper, a set-trie data structure for storing sets is considered, along with the efficient algorithms for the corresponding set containment operations. We present the mathematical and empirical study of the set-trie. In the mathematical study, the relevant upper-bounds on the efficiency of its expected performance are established by utilizing a natural probabilistic model. In the empirical study, we give insight into how different distributions of input data impact the efficiency of set-trie. Using the correct parameters for those randomly generated datasets, we expose the key sources of the input sensitivity of set-trie. Finally, the empirical comparison of set-trie with the inverted index is based on the real-world datasets containing sets of low cardinality. The comparison shows that the running time of set-trie consistently outperforms the inverted index by orders of magnitude.
format Online
Article
Text
id pubmed-7875400
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-78754002021-02-19 Data structure set-trie for storing and querying sets: Theoretical and empirical analysis Savnik, Iztok Akulich, Mikita Krnc, Matjaž Škrekovski, Riste PLoS One Research Article Set containment operations form an important tool in various fields such as information retrieval, AI systems, object-relational databases, and Internet applications. In the paper, a set-trie data structure for storing sets is considered, along with the efficient algorithms for the corresponding set containment operations. We present the mathematical and empirical study of the set-trie. In the mathematical study, the relevant upper-bounds on the efficiency of its expected performance are established by utilizing a natural probabilistic model. In the empirical study, we give insight into how different distributions of input data impact the efficiency of set-trie. Using the correct parameters for those randomly generated datasets, we expose the key sources of the input sensitivity of set-trie. Finally, the empirical comparison of set-trie with the inverted index is based on the real-world datasets containing sets of low cardinality. The comparison shows that the running time of set-trie consistently outperforms the inverted index by orders of magnitude. Public Library of Science 2021-02-10 /pmc/articles/PMC7875400/ /pubmed/33566827 http://dx.doi.org/10.1371/journal.pone.0245122 Text en © 2021 Savnik et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Savnik, Iztok
Akulich, Mikita
Krnc, Matjaž
Škrekovski, Riste
Data structure set-trie for storing and querying sets: Theoretical and empirical analysis
title Data structure set-trie for storing and querying sets: Theoretical and empirical analysis
title_full Data structure set-trie for storing and querying sets: Theoretical and empirical analysis
title_fullStr Data structure set-trie for storing and querying sets: Theoretical and empirical analysis
title_full_unstemmed Data structure set-trie for storing and querying sets: Theoretical and empirical analysis
title_short Data structure set-trie for storing and querying sets: Theoretical and empirical analysis
title_sort data structure set-trie for storing and querying sets: theoretical and empirical analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7875400/
https://www.ncbi.nlm.nih.gov/pubmed/33566827
http://dx.doi.org/10.1371/journal.pone.0245122
work_keys_str_mv AT savnikiztok datastructuresettrieforstoringandqueryingsetstheoreticalandempiricalanalysis
AT akulichmikita datastructuresettrieforstoringandqueryingsetstheoreticalandempiricalanalysis
AT krncmatjaz datastructuresettrieforstoringandqueryingsetstheoreticalandempiricalanalysis
AT skrekovskiriste datastructuresettrieforstoringandqueryingsetstheoreticalandempiricalanalysis