Cargando…

SchemaTree: Maximum-Likelihood Property Recommendation for Wikidata

Wikidata is a free and open knowledge base which can be read and edited by both humans and machines. It acts as a central storage for the structured data of several Wikimedia projects. To improve the process of manually inserting new facts, the Wikidata platform features an association rule-based to...

Descripción completa

Detalles Bibliográficos
Autores principales: Gleim, Lars C., Schimassek, Rafael, Hüser, Dominik, Peters, Maximilian, Krämer, Christoph, Cochez, Michael, Decker, Stefan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7250627/
http://dx.doi.org/10.1007/978-3-030-49461-2_11
_version_ 1783538800005218304
author Gleim, Lars C.
Schimassek, Rafael
Hüser, Dominik
Peters, Maximilian
Krämer, Christoph
Cochez, Michael
Decker, Stefan
author_facet Gleim, Lars C.
Schimassek, Rafael
Hüser, Dominik
Peters, Maximilian
Krämer, Christoph
Cochez, Michael
Decker, Stefan
author_sort Gleim, Lars C.
collection PubMed
description Wikidata is a free and open knowledge base which can be read and edited by both humans and machines. It acts as a central storage for the structured data of several Wikimedia projects. To improve the process of manually inserting new facts, the Wikidata platform features an association rule-based tool to recommend additional suitable properties. In this work, we introduce a novel approach to provide such recommendations based on frequentist inference. We introduce a trie-based method that can efficiently learn and represent property set probabilities in RDF graphs. We extend the method by adding type information to improve recommendation precision and introduce backoff strategies which further increase the performance of the initial approach for entities with rare property combinations. We investigate how the captured structure can be employed for property recommendation, analogously to the Wikidata PropertySuggester. We evaluate our approach on the full Wikidata dataset and compare its performance to the state-of-the-art Wikidata PropertySuggester, outperforming it in all evaluated metrics. Notably we could reduce the average rank of the first relevant recommendation by 71%.
format Online
Article
Text
id pubmed-7250627
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-72506272020-05-27 SchemaTree: Maximum-Likelihood Property Recommendation for Wikidata Gleim, Lars C. Schimassek, Rafael Hüser, Dominik Peters, Maximilian Krämer, Christoph Cochez, Michael Decker, Stefan The Semantic Web Article Wikidata is a free and open knowledge base which can be read and edited by both humans and machines. It acts as a central storage for the structured data of several Wikimedia projects. To improve the process of manually inserting new facts, the Wikidata platform features an association rule-based tool to recommend additional suitable properties. In this work, we introduce a novel approach to provide such recommendations based on frequentist inference. We introduce a trie-based method that can efficiently learn and represent property set probabilities in RDF graphs. We extend the method by adding type information to improve recommendation precision and introduce backoff strategies which further increase the performance of the initial approach for entities with rare property combinations. We investigate how the captured structure can be employed for property recommendation, analogously to the Wikidata PropertySuggester. We evaluate our approach on the full Wikidata dataset and compare its performance to the state-of-the-art Wikidata PropertySuggester, outperforming it in all evaluated metrics. Notably we could reduce the average rank of the first relevant recommendation by 71%. 2020-05-07 /pmc/articles/PMC7250627/ http://dx.doi.org/10.1007/978-3-030-49461-2_11 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Gleim, Lars C.
Schimassek, Rafael
Hüser, Dominik
Peters, Maximilian
Krämer, Christoph
Cochez, Michael
Decker, Stefan
SchemaTree: Maximum-Likelihood Property Recommendation for Wikidata
title SchemaTree: Maximum-Likelihood Property Recommendation for Wikidata
title_full SchemaTree: Maximum-Likelihood Property Recommendation for Wikidata
title_fullStr SchemaTree: Maximum-Likelihood Property Recommendation for Wikidata
title_full_unstemmed SchemaTree: Maximum-Likelihood Property Recommendation for Wikidata
title_short SchemaTree: Maximum-Likelihood Property Recommendation for Wikidata
title_sort schematree: maximum-likelihood property recommendation for wikidata
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7250627/
http://dx.doi.org/10.1007/978-3-030-49461-2_11
work_keys_str_mv AT gleimlarsc schematreemaximumlikelihoodpropertyrecommendationforwikidata
AT schimassekrafael schematreemaximumlikelihoodpropertyrecommendationforwikidata
AT huserdominik schematreemaximumlikelihoodpropertyrecommendationforwikidata
AT petersmaximilian schematreemaximumlikelihoodpropertyrecommendationforwikidata
AT kramerchristoph schematreemaximumlikelihoodpropertyrecommendationforwikidata
AT cochezmichael schematreemaximumlikelihoodpropertyrecommendationforwikidata
AT deckerstefan schematreemaximumlikelihoodpropertyrecommendationforwikidata