Cargando…

Estimating Characteristic Sets for RDF Dataset Profiles Based on Sampling

RDF dataset profiles provide a formal representation of a dataset’s characteristics (features). These profiles may cover various aspects of the data represented in the dataset as well as statistical descriptors of the data distribution. In this work, we focus on the characteristic sets profile featu...

Descripción completa

Detalles Bibliográficos
Autores principales: Heling, Lars, Acosta, Maribel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7250620/
http://dx.doi.org/10.1007/978-3-030-49461-2_10
_version_ 1783538798235222016
author Heling, Lars
Acosta, Maribel
author_facet Heling, Lars
Acosta, Maribel
author_sort Heling, Lars
collection PubMed
description RDF dataset profiles provide a formal representation of a dataset’s characteristics (features). These profiles may cover various aspects of the data represented in the dataset as well as statistical descriptors of the data distribution. In this work, we focus on the characteristic sets profile feature summarizing the characteristic sets contained in an RDF graph. As this type of feature provides detailed information on both the structure and semantics of RDF graphs, they can be very beneficial in query optimization. However, in decentralized query processing, computing them is challenging as it is difficult and/or costly to access and process all datasets. To overcome this shortcoming, we propose the concept of a profile feature estimation. We present sampling methods and projection functions to generate estimations which aim to be as similar as possible to the original characteristic sets profile feature. In our evaluation, we investigate the feasibility of the proposed methods on four RDF graphs. Our results show that samples containing [Formula: see text] of the entities in the graph allow for good estimations and may be used by downstream tasks such as query plan optimization in decentralized querying.
format Online
Article
Text
id pubmed-7250620
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-72506202020-05-27 Estimating Characteristic Sets for RDF Dataset Profiles Based on Sampling Heling, Lars Acosta, Maribel The Semantic Web Article RDF dataset profiles provide a formal representation of a dataset’s characteristics (features). These profiles may cover various aspects of the data represented in the dataset as well as statistical descriptors of the data distribution. In this work, we focus on the characteristic sets profile feature summarizing the characteristic sets contained in an RDF graph. As this type of feature provides detailed information on both the structure and semantics of RDF graphs, they can be very beneficial in query optimization. However, in decentralized query processing, computing them is challenging as it is difficult and/or costly to access and process all datasets. To overcome this shortcoming, we propose the concept of a profile feature estimation. We present sampling methods and projection functions to generate estimations which aim to be as similar as possible to the original characteristic sets profile feature. In our evaluation, we investigate the feasibility of the proposed methods on four RDF graphs. Our results show that samples containing [Formula: see text] of the entities in the graph allow for good estimations and may be used by downstream tasks such as query plan optimization in decentralized querying. 2020-05-07 /pmc/articles/PMC7250620/ http://dx.doi.org/10.1007/978-3-030-49461-2_10 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Heling, Lars
Acosta, Maribel
Estimating Characteristic Sets for RDF Dataset Profiles Based on Sampling
title Estimating Characteristic Sets for RDF Dataset Profiles Based on Sampling
title_full Estimating Characteristic Sets for RDF Dataset Profiles Based on Sampling
title_fullStr Estimating Characteristic Sets for RDF Dataset Profiles Based on Sampling
title_full_unstemmed Estimating Characteristic Sets for RDF Dataset Profiles Based on Sampling
title_short Estimating Characteristic Sets for RDF Dataset Profiles Based on Sampling
title_sort estimating characteristic sets for rdf dataset profiles based on sampling
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7250620/
http://dx.doi.org/10.1007/978-3-030-49461-2_10
work_keys_str_mv AT helinglars estimatingcharacteristicsetsforrdfdatasetprofilesbasedonsampling
AT acostamaribel estimatingcharacteristicsetsforrdfdatasetprofilesbasedonsampling