Cargando…

Sense identification data: A dataset for lexical semantics

Sense Identification is a newly proposed task; in considering a pair of terms to assess their conceptual similarity, human raters are postulated to preliminarily select a sense pair. Senses involved in this pair are those actually subject to similarity rating. The sense identification task is search...

Descripción completa

Detalles Bibliográficos
Autores principales:	Colla, Davide, Mensa, Enrico, Radicioni, Daniele P.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2020
Materias:	Data Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7494475/ https://www.ncbi.nlm.nih.gov/pubmed/32984463 http://dx.doi.org/10.1016/j.dib.2020.106267

_version_	1783582758863372288
author	Colla, Davide Mensa, Enrico Radicioni, Daniele P.
author_facet	Colla, Davide Mensa, Enrico Radicioni, Daniele P.
author_sort	Colla, Davide
collection	PubMed
description	Sense Identification is a newly proposed task; in considering a pair of terms to assess their conceptual similarity, human raters are postulated to preliminarily select a sense pair. Senses involved in this pair are those actually subject to similarity rating. The sense identification task is searching for the sense selected during the similarity rating. The sense individuation task is important to investigate strategies and sense inventories underlying human lexical access and, moreover, it is a relevant complement to the semantic similarity task. Individuating which senses are involved in the similarity rating is also crucial in order to fully assess those ratings: if we have no idea of which two senses were retrieved, on which base can we assess the score expressing their semantic proximity? The Sense Identification Dataset (SID) dataset has been built to provide a common experimental ground to systems and approaches dealing with the sense identification task. It is the first dataset specifically designed for experimenting on the mentioned task. The SID dataset was created by manually annotating with sense identifiers the term pairs from an existing dataset, the SemEval-2017 Task 2 English dataset. The original dataset was originally conceived for experimenting on the semantic similarity task, and it contains a score expressing the human similarity rating for each term pair. For each such term pair we added a pair of annotated senses: in particular, senses were annotated such that they are compatible (explicative of) with the existing similarity ratings. The SID dataset contains BabelNet sense identifiers. This sense inventory is a broadly adopted ‘naming convention’ for word senses, and such identifiers can be easily mapped onto further resources such as WordNet and WikiData, thereby enabling further processing tasks and usages in the Natural Language Processing pipeline.
format	Online Article Text
id	pubmed-7494475
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-74944752020-09-24 Sense identification data: A dataset for lexical semantics Colla, Davide Mensa, Enrico Radicioni, Daniele P. Data Brief Data Article Sense Identification is a newly proposed task; in considering a pair of terms to assess their conceptual similarity, human raters are postulated to preliminarily select a sense pair. Senses involved in this pair are those actually subject to similarity rating. The sense identification task is searching for the sense selected during the similarity rating. The sense individuation task is important to investigate strategies and sense inventories underlying human lexical access and, moreover, it is a relevant complement to the semantic similarity task. Individuating which senses are involved in the similarity rating is also crucial in order to fully assess those ratings: if we have no idea of which two senses were retrieved, on which base can we assess the score expressing their semantic proximity? The Sense Identification Dataset (SID) dataset has been built to provide a common experimental ground to systems and approaches dealing with the sense identification task. It is the first dataset specifically designed for experimenting on the mentioned task. The SID dataset was created by manually annotating with sense identifiers the term pairs from an existing dataset, the SemEval-2017 Task 2 English dataset. The original dataset was originally conceived for experimenting on the semantic similarity task, and it contains a score expressing the human similarity rating for each term pair. For each such term pair we added a pair of annotated senses: in particular, senses were annotated such that they are compatible (explicative of) with the existing similarity ratings. The SID dataset contains BabelNet sense identifiers. This sense inventory is a broadly adopted ‘naming convention’ for word senses, and such identifiers can be easily mapped onto further resources such as WordNet and WikiData, thereby enabling further processing tasks and usages in the Natural Language Processing pipeline. Elsevier 2020-09-03 /pmc/articles/PMC7494475/ /pubmed/32984463 http://dx.doi.org/10.1016/j.dib.2020.106267 Text en © 2020 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Data Article Colla, Davide Mensa, Enrico Radicioni, Daniele P. Sense identification data: A dataset for lexical semantics
title	Sense identification data: A dataset for lexical semantics
title_full	Sense identification data: A dataset for lexical semantics
title_fullStr	Sense identification data: A dataset for lexical semantics
title_full_unstemmed	Sense identification data: A dataset for lexical semantics
title_short	Sense identification data: A dataset for lexical semantics
title_sort	sense identification data: a dataset for lexical semantics
topic	Data Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7494475/ https://www.ncbi.nlm.nih.gov/pubmed/32984463 http://dx.doi.org/10.1016/j.dib.2020.106267
work_keys_str_mv	AT colladavide senseidentificationdataadatasetforlexicalsemantics AT mensaenrico senseidentificationdataadatasetforlexicalsemantics AT radicionidanielep senseidentificationdataadatasetforlexicalsemantics

Sense identification data: A dataset for lexical semantics

Ejemplares similares