Cargando…

HQA-Data: A historical question answer generation dataset from previous multi perspective conversation

This data article contains a quality assurance dataset for training the chatbot and chat analysis model. This dataset focuses on NLP tasks, as a model that serves and delivers a satisfactory response to a user's query. We obtained data from a well- known dataset known as “The Ubuntu Dialogue Co...

Descripción completa

Detalles Bibliográficos
Autores principales: Hosen, Sabbir, Eva, Jannatul Ferdous, Hasib, Ayman, Saha, Aloke Kumar, Mridha, M.F., Wadud, Anwar Hussen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10294004/
https://www.ncbi.nlm.nih.gov/pubmed/37383776
http://dx.doi.org/10.1016/j.dib.2023.109245
_version_ 1785063106419884032
author Hosen, Sabbir
Eva, Jannatul Ferdous
Hasib, Ayman
Saha, Aloke Kumar
Mridha, M.F.
Wadud, Anwar Hussen
author_facet Hosen, Sabbir
Eva, Jannatul Ferdous
Hasib, Ayman
Saha, Aloke Kumar
Mridha, M.F.
Wadud, Anwar Hussen
author_sort Hosen, Sabbir
collection PubMed
description This data article contains a quality assurance dataset for training the chatbot and chat analysis model. This dataset focuses on NLP tasks, as a model that serves and delivers a satisfactory response to a user's query. We obtained data from a well- known dataset known as “The Ubuntu Dialogue Corpus” for the purpose of constructing our dataset. Which consists of about one million multi-turn conversations containing around seven million utterances and one hundred million words. We derived a context for each dialogueID from these lengthy Ubuntu Dialogue Corpus conversations. We have generated a number of questions and answers based on these contexts. All of these questions and answers are contained within the context. This dataset includes 9364 contexts, 36,438 question-answer pairs. In addition to academic research, the dataset may be used for activities such as constructing this QA for another language, deep learning, language interpretation, reading comprehension, and open-domain question answering. We present the data in raw format; it has been open sourced and publicly available at https://data.mendeley.com/datasets/p85z3v45xk.
format Online
Article
Text
id pubmed-10294004
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-102940042023-06-28 HQA-Data: A historical question answer generation dataset from previous multi perspective conversation Hosen, Sabbir Eva, Jannatul Ferdous Hasib, Ayman Saha, Aloke Kumar Mridha, M.F. Wadud, Anwar Hussen Data Brief Data Article This data article contains a quality assurance dataset for training the chatbot and chat analysis model. This dataset focuses on NLP tasks, as a model that serves and delivers a satisfactory response to a user's query. We obtained data from a well- known dataset known as “The Ubuntu Dialogue Corpus” for the purpose of constructing our dataset. Which consists of about one million multi-turn conversations containing around seven million utterances and one hundred million words. We derived a context for each dialogueID from these lengthy Ubuntu Dialogue Corpus conversations. We have generated a number of questions and answers based on these contexts. All of these questions and answers are contained within the context. This dataset includes 9364 contexts, 36,438 question-answer pairs. In addition to academic research, the dataset may be used for activities such as constructing this QA for another language, deep learning, language interpretation, reading comprehension, and open-domain question answering. We present the data in raw format; it has been open sourced and publicly available at https://data.mendeley.com/datasets/p85z3v45xk. Elsevier 2023-05-18 /pmc/articles/PMC10294004/ /pubmed/37383776 http://dx.doi.org/10.1016/j.dib.2023.109245 Text en © 2023 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Data Article
Hosen, Sabbir
Eva, Jannatul Ferdous
Hasib, Ayman
Saha, Aloke Kumar
Mridha, M.F.
Wadud, Anwar Hussen
HQA-Data: A historical question answer generation dataset from previous multi perspective conversation
title HQA-Data: A historical question answer generation dataset from previous multi perspective conversation
title_full HQA-Data: A historical question answer generation dataset from previous multi perspective conversation
title_fullStr HQA-Data: A historical question answer generation dataset from previous multi perspective conversation
title_full_unstemmed HQA-Data: A historical question answer generation dataset from previous multi perspective conversation
title_short HQA-Data: A historical question answer generation dataset from previous multi perspective conversation
title_sort hqa-data: a historical question answer generation dataset from previous multi perspective conversation
topic Data Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10294004/
https://www.ncbi.nlm.nih.gov/pubmed/37383776
http://dx.doi.org/10.1016/j.dib.2023.109245
work_keys_str_mv AT hosensabbir hqadataahistoricalquestionanswergenerationdatasetfrompreviousmultiperspectiveconversation
AT evajannatulferdous hqadataahistoricalquestionanswergenerationdatasetfrompreviousmultiperspectiveconversation
AT hasibayman hqadataahistoricalquestionanswergenerationdatasetfrompreviousmultiperspectiveconversation
AT sahaalokekumar hqadataahistoricalquestionanswergenerationdatasetfrompreviousmultiperspectiveconversation
AT mridhamf hqadataahistoricalquestionanswergenerationdatasetfrompreviousmultiperspectiveconversation
AT wadudanwarhussen hqadataahistoricalquestionanswergenerationdatasetfrompreviousmultiperspectiveconversation