Cargando…

Experimental data for computing semantic similarity between concepts using multiple inheritances in Wikipedia category graph

This data article compiles the detailed and descriptive experimental data of Wikipedia-based semantic similarity approach called as Neighbourhood Aggregated Semantic Contribution (NASC), presented in Husain, et al. [1]. The JWPL (Java Wikipedia Library)-DataMachine and JWPL WikipediaAPI are used to...

Descripción completa

Detalles Bibliográficos
Autores principales: Hussain, Muhammad Jawad, Wasti, Shahbaz Hassan, Huang, Guangjian, Jiang, Yuncheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7118305/
https://www.ncbi.nlm.nih.gov/pubmed/32258267
http://dx.doi.org/10.1016/j.dib.2020.105377
_version_ 1783514532902076416
author Hussain, Muhammad Jawad
Wasti, Shahbaz Hassan
Huang, Guangjian
Jiang, Yuncheng
author_facet Hussain, Muhammad Jawad
Wasti, Shahbaz Hassan
Huang, Guangjian
Jiang, Yuncheng
author_sort Hussain, Muhammad Jawad
collection PubMed
description This data article compiles the detailed and descriptive experimental data of Wikipedia-based semantic similarity approach called as Neighbourhood Aggregated Semantic Contribution (NASC), presented in Husain, et al. [1]. The JWPL (Java Wikipedia Library)-DataMachine and JWPL WikipediaAPI are used to extract the required Wikipedia features from Wikipedia dump. The dataset presents the disambiguated Wikipedia concepts of the gold standard word similarity benchmarks MC30 (English), RG65(es) (Spanish) and RG65(fr) (French) and their associated set of categories in the corresponding Wikipedia category graph (WCG). The dataset also contains the number of ancestors, common ancestors, pages, and common pages in the k-neighbourhood of the associated categories for different levels of parameter k in the English, Spanish, and French WCGs. The presented dataset can be used to assess the semantic similarity between Wikipedia concepts in English (MC30), Spanish (RG65(es)), and French (RG65(fr)) languages benchmarks. Moreover, the dataset will be useful for the further analysis and comparison of the taxonomic structures of the English, Spanish, and French WCGs.
format Online
Article
Text
id pubmed-7118305
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-71183052020-04-06 Experimental data for computing semantic similarity between concepts using multiple inheritances in Wikipedia category graph Hussain, Muhammad Jawad Wasti, Shahbaz Hassan Huang, Guangjian Jiang, Yuncheng Data Brief Computer Science This data article compiles the detailed and descriptive experimental data of Wikipedia-based semantic similarity approach called as Neighbourhood Aggregated Semantic Contribution (NASC), presented in Husain, et al. [1]. The JWPL (Java Wikipedia Library)-DataMachine and JWPL WikipediaAPI are used to extract the required Wikipedia features from Wikipedia dump. The dataset presents the disambiguated Wikipedia concepts of the gold standard word similarity benchmarks MC30 (English), RG65(es) (Spanish) and RG65(fr) (French) and their associated set of categories in the corresponding Wikipedia category graph (WCG). The dataset also contains the number of ancestors, common ancestors, pages, and common pages in the k-neighbourhood of the associated categories for different levels of parameter k in the English, Spanish, and French WCGs. The presented dataset can be used to assess the semantic similarity between Wikipedia concepts in English (MC30), Spanish (RG65(es)), and French (RG65(fr)) languages benchmarks. Moreover, the dataset will be useful for the further analysis and comparison of the taxonomic structures of the English, Spanish, and French WCGs. Elsevier 2020-03-10 /pmc/articles/PMC7118305/ /pubmed/32258267 http://dx.doi.org/10.1016/j.dib.2020.105377 Text en © 2020 The Authors. Published by Elsevier Inc. http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Computer Science
Hussain, Muhammad Jawad
Wasti, Shahbaz Hassan
Huang, Guangjian
Jiang, Yuncheng
Experimental data for computing semantic similarity between concepts using multiple inheritances in Wikipedia category graph
title Experimental data for computing semantic similarity between concepts using multiple inheritances in Wikipedia category graph
title_full Experimental data for computing semantic similarity between concepts using multiple inheritances in Wikipedia category graph
title_fullStr Experimental data for computing semantic similarity between concepts using multiple inheritances in Wikipedia category graph
title_full_unstemmed Experimental data for computing semantic similarity between concepts using multiple inheritances in Wikipedia category graph
title_short Experimental data for computing semantic similarity between concepts using multiple inheritances in Wikipedia category graph
title_sort experimental data for computing semantic similarity between concepts using multiple inheritances in wikipedia category graph
topic Computer Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7118305/
https://www.ncbi.nlm.nih.gov/pubmed/32258267
http://dx.doi.org/10.1016/j.dib.2020.105377
work_keys_str_mv AT hussainmuhammadjawad experimentaldataforcomputingsemanticsimilaritybetweenconceptsusingmultipleinheritancesinwikipediacategorygraph
AT wastishahbazhassan experimentaldataforcomputingsemanticsimilaritybetweenconceptsusingmultipleinheritancesinwikipediacategorygraph
AT huangguangjian experimentaldataforcomputingsemanticsimilaritybetweenconceptsusingmultipleinheritancesinwikipediacategorygraph
AT jiangyuncheng experimentaldataforcomputingsemanticsimilaritybetweenconceptsusingmultipleinheritancesinwikipediacategorygraph