Cargando…

Multi-source dataset of e-commerce products with attributes for property matching

Schema/ontology matching consists in finding matches between types, properties and entities in heterogeneous sources of data in order to integrate them, which has become increasingly relevant with the development of web technologies and open data initiatives. One of the involved tasks is the matchin...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ayala, Daniel, Hernández, Inma, Ruiz, David, Rahm, Erhard
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2022
Materias:	Data Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8847803/ https://www.ncbi.nlm.nih.gov/pubmed/35198667 http://dx.doi.org/10.1016/j.dib.2022.107884

_version_	1784652125172662272
author	Ayala, Daniel Hernández, Inma Ruiz, David Rahm, Erhard
author_facet	Ayala, Daniel Hernández, Inma Ruiz, David Rahm, Erhard
author_sort	Ayala, Daniel
collection	PubMed
description	Schema/ontology matching consists in finding matches between types, properties and entities in heterogeneous sources of data in order to integrate them, which has become increasingly relevant with the development of web technologies and open data initiatives. One of the involved tasks is the matching of data properties, which attempts to try to find correspondences between the attributes of the entities. This is challenging due to the at times different names of equivalent properties. Furthermore, some properties may not be equivalent, but still match in 1..n relationships. These difficulties create the need for varied evaluation datasets for two reasons. First, they are needed to evaluate existing techniques in a variety of scenarios. Second, they enable the training of supervised techniques that may even become context-independent if trained with data from diverse enough contexts. To support the evaluation and training of data property matching techniques, we present a collection dataset consisting of product records from four different contexts. These datasets are the result of transforming two different existing datasets. In one of the datasets, some properties were filtered for being too noisy. The resulting processed dataset consists of json files with a listing of the product records and their properties, and a separate grouping of the properties that determines which ones match. It contains information about 2860 entities, with 4386 properties and 13350 pairwise matches.
format	Online Article Text
id	pubmed-8847803
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-88478032022-02-22 Multi-source dataset of e-commerce products with attributes for property matching Ayala, Daniel Hernández, Inma Ruiz, David Rahm, Erhard Data Brief Data Article Schema/ontology matching consists in finding matches between types, properties and entities in heterogeneous sources of data in order to integrate them, which has become increasingly relevant with the development of web technologies and open data initiatives. One of the involved tasks is the matching of data properties, which attempts to try to find correspondences between the attributes of the entities. This is challenging due to the at times different names of equivalent properties. Furthermore, some properties may not be equivalent, but still match in 1..n relationships. These difficulties create the need for varied evaluation datasets for two reasons. First, they are needed to evaluate existing techniques in a variety of scenarios. Second, they enable the training of supervised techniques that may even become context-independent if trained with data from diverse enough contexts. To support the evaluation and training of data property matching techniques, we present a collection dataset consisting of product records from four different contexts. These datasets are the result of transforming two different existing datasets. In one of the datasets, some properties were filtered for being too noisy. The resulting processed dataset consists of json files with a listing of the product records and their properties, and a separate grouping of the properties that determines which ones match. It contains information about 2860 entities, with 4386 properties and 13350 pairwise matches. Elsevier 2022-02-02 /pmc/articles/PMC8847803/ /pubmed/35198667 http://dx.doi.org/10.1016/j.dib.2022.107884 Text en © 2022 The Authors. Published by Elsevier Inc. https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Data Article Ayala, Daniel Hernández, Inma Ruiz, David Rahm, Erhard Multi-source dataset of e-commerce products with attributes for property matching
title	Multi-source dataset of e-commerce products with attributes for property matching
title_full	Multi-source dataset of e-commerce products with attributes for property matching
title_fullStr	Multi-source dataset of e-commerce products with attributes for property matching
title_full_unstemmed	Multi-source dataset of e-commerce products with attributes for property matching
title_short	Multi-source dataset of e-commerce products with attributes for property matching
title_sort	multi-source dataset of e-commerce products with attributes for property matching
topic	Data Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8847803/ https://www.ncbi.nlm.nih.gov/pubmed/35198667 http://dx.doi.org/10.1016/j.dib.2022.107884
work_keys_str_mv	AT ayaladaniel multisourcedatasetofecommerceproductswithattributesforpropertymatching AT hernandezinma multisourcedatasetofecommerceproductswithattributesforpropertymatching AT ruizdavid multisourcedatasetofecommerceproductswithattributesforpropertymatching AT rahmerhard multisourcedatasetofecommerceproductswithattributesforpropertymatching

Multi-source dataset of e-commerce products with attributes for property matching

Ejemplares similares