Cargando…

Event-Dataset: Temporal information retrieval and text classification dataset

Recently, Temporal Information Retrieval (TIR) has grabbed the major attention of the information retrieval community. TIR exploits the temporal dynamics in the information retrieval process and harnesses both textual relevance and temporal relevance to fulfill the temporal information requirements...

Descripción completa

Detalles Bibliográficos
Autores principales:	Khan, Shafiq Ur Rehman, Islam, Muhammad Arshad
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2019
Materias:	Computer Science
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6554222/ https://www.ncbi.nlm.nih.gov/pubmed/31194158 http://dx.doi.org/10.1016/j.dib.2019.104048

_version_	1783424926464606208
author	Khan, Shafiq Ur Rehman Islam, Muhammad Arshad
author_facet	Khan, Shafiq Ur Rehman Islam, Muhammad Arshad
author_sort	Khan, Shafiq Ur Rehman
collection	PubMed
description	Recently, Temporal Information Retrieval (TIR) has grabbed the major attention of the information retrieval community. TIR exploits the temporal dynamics in the information retrieval process and harnesses both textual relevance and temporal relevance to fulfill the temporal information requirements of a user Ur Rehman Khan et al., 2018. The focus time of document is an important temporal aspect which is defined as the time to which the content of the document refers Jatowt et al., 2015; Jatowt et al., 2013; Morbidoni et al., 2018, Khan et al., 2018. To the best of our knowledge, there does not exist any standard benchmark data set (publicly available) that holds the potential to comprehensively evaluate the performance of focus time assessment strategies. Considering these aspects, we have produced the Event-dataset, which is comprised of 35 queries and set of news articles for each query. Such that, [Formula: see text] where C represents the dataset, [Formula: see text] is query set [Formula: see text] and for each [Formula: see text] there is a set of news articles [Formula: see text]. [Formula: see text] are sets of relevant documents and non-relevant documents respectively. Each query in the dataset represents a popular event. To annotate these articles into relevant and non-relevant, we have employed a user-study based evaluation method wherein a group of postgraduate students manually annotate the articles into the aforementioned categories. We believe that the generation of such dataset can provide an opportunity for the information retrieval researchers to use it as a benchmark to evaluate focus time assessment methods specifically and information retrieval methods generically.
format	Online Article Text
id	pubmed-6554222
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-65542222019-06-10 Event-Dataset: Temporal information retrieval and text classification dataset Khan, Shafiq Ur Rehman Islam, Muhammad Arshad Data Brief Computer Science Recently, Temporal Information Retrieval (TIR) has grabbed the major attention of the information retrieval community. TIR exploits the temporal dynamics in the information retrieval process and harnesses both textual relevance and temporal relevance to fulfill the temporal information requirements of a user Ur Rehman Khan et al., 2018. The focus time of document is an important temporal aspect which is defined as the time to which the content of the document refers Jatowt et al., 2015; Jatowt et al., 2013; Morbidoni et al., 2018, Khan et al., 2018. To the best of our knowledge, there does not exist any standard benchmark data set (publicly available) that holds the potential to comprehensively evaluate the performance of focus time assessment strategies. Considering these aspects, we have produced the Event-dataset, which is comprised of 35 queries and set of news articles for each query. Such that, [Formula: see text] where C represents the dataset, [Formula: see text] is query set [Formula: see text] and for each [Formula: see text] there is a set of news articles [Formula: see text]. [Formula: see text] are sets of relevant documents and non-relevant documents respectively. Each query in the dataset represents a popular event. To annotate these articles into relevant and non-relevant, we have employed a user-study based evaluation method wherein a group of postgraduate students manually annotate the articles into the aforementioned categories. We believe that the generation of such dataset can provide an opportunity for the information retrieval researchers to use it as a benchmark to evaluate focus time assessment methods specifically and information retrieval methods generically. Elsevier 2019-05-23 /pmc/articles/PMC6554222/ /pubmed/31194158 http://dx.doi.org/10.1016/j.dib.2019.104048 Text en © 2019 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Computer Science Khan, Shafiq Ur Rehman Islam, Muhammad Arshad Event-Dataset: Temporal information retrieval and text classification dataset
title	Event-Dataset: Temporal information retrieval and text classification dataset
title_full	Event-Dataset: Temporal information retrieval and text classification dataset
title_fullStr	Event-Dataset: Temporal information retrieval and text classification dataset
title_full_unstemmed	Event-Dataset: Temporal information retrieval and text classification dataset
title_short	Event-Dataset: Temporal information retrieval and text classification dataset
title_sort	event-dataset: temporal information retrieval and text classification dataset
topic	Computer Science
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6554222/ https://www.ncbi.nlm.nih.gov/pubmed/31194158 http://dx.doi.org/10.1016/j.dib.2019.104048
work_keys_str_mv	AT khanshafiqurrehman eventdatasettemporalinformationretrievalandtextclassificationdataset AT islammuhammadarshad eventdatasettemporalinformationretrievalandtextclassificationdataset

Event-Dataset: Temporal information retrieval and text classification dataset

Ejemplares similares