Cargando…
HQA-Data: A historical question answer generation dataset from previous multi perspective conversation
This data article contains a quality assurance dataset for training the chatbot and chat analysis model. This dataset focuses on NLP tasks, as a model that serves and delivers a satisfactory response to a user's query. We obtained data from a well- known dataset known as “The Ubuntu Dialogue Co...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10294004/ https://www.ncbi.nlm.nih.gov/pubmed/37383776 http://dx.doi.org/10.1016/j.dib.2023.109245 |
_version_ | 1785063106419884032 |
---|---|
author | Hosen, Sabbir Eva, Jannatul Ferdous Hasib, Ayman Saha, Aloke Kumar Mridha, M.F. Wadud, Anwar Hussen |
author_facet | Hosen, Sabbir Eva, Jannatul Ferdous Hasib, Ayman Saha, Aloke Kumar Mridha, M.F. Wadud, Anwar Hussen |
author_sort | Hosen, Sabbir |
collection | PubMed |
description | This data article contains a quality assurance dataset for training the chatbot and chat analysis model. This dataset focuses on NLP tasks, as a model that serves and delivers a satisfactory response to a user's query. We obtained data from a well- known dataset known as “The Ubuntu Dialogue Corpus” for the purpose of constructing our dataset. Which consists of about one million multi-turn conversations containing around seven million utterances and one hundred million words. We derived a context for each dialogueID from these lengthy Ubuntu Dialogue Corpus conversations. We have generated a number of questions and answers based on these contexts. All of these questions and answers are contained within the context. This dataset includes 9364 contexts, 36,438 question-answer pairs. In addition to academic research, the dataset may be used for activities such as constructing this QA for another language, deep learning, language interpretation, reading comprehension, and open-domain question answering. We present the data in raw format; it has been open sourced and publicly available at https://data.mendeley.com/datasets/p85z3v45xk. |
format | Online Article Text |
id | pubmed-10294004 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-102940042023-06-28 HQA-Data: A historical question answer generation dataset from previous multi perspective conversation Hosen, Sabbir Eva, Jannatul Ferdous Hasib, Ayman Saha, Aloke Kumar Mridha, M.F. Wadud, Anwar Hussen Data Brief Data Article This data article contains a quality assurance dataset for training the chatbot and chat analysis model. This dataset focuses on NLP tasks, as a model that serves and delivers a satisfactory response to a user's query. We obtained data from a well- known dataset known as “The Ubuntu Dialogue Corpus” for the purpose of constructing our dataset. Which consists of about one million multi-turn conversations containing around seven million utterances and one hundred million words. We derived a context for each dialogueID from these lengthy Ubuntu Dialogue Corpus conversations. We have generated a number of questions and answers based on these contexts. All of these questions and answers are contained within the context. This dataset includes 9364 contexts, 36,438 question-answer pairs. In addition to academic research, the dataset may be used for activities such as constructing this QA for another language, deep learning, language interpretation, reading comprehension, and open-domain question answering. We present the data in raw format; it has been open sourced and publicly available at https://data.mendeley.com/datasets/p85z3v45xk. Elsevier 2023-05-18 /pmc/articles/PMC10294004/ /pubmed/37383776 http://dx.doi.org/10.1016/j.dib.2023.109245 Text en © 2023 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Data Article Hosen, Sabbir Eva, Jannatul Ferdous Hasib, Ayman Saha, Aloke Kumar Mridha, M.F. Wadud, Anwar Hussen HQA-Data: A historical question answer generation dataset from previous multi perspective conversation |
title | HQA-Data: A historical question answer generation dataset from previous multi perspective conversation |
title_full | HQA-Data: A historical question answer generation dataset from previous multi perspective conversation |
title_fullStr | HQA-Data: A historical question answer generation dataset from previous multi perspective conversation |
title_full_unstemmed | HQA-Data: A historical question answer generation dataset from previous multi perspective conversation |
title_short | HQA-Data: A historical question answer generation dataset from previous multi perspective conversation |
title_sort | hqa-data: a historical question answer generation dataset from previous multi perspective conversation |
topic | Data Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10294004/ https://www.ncbi.nlm.nih.gov/pubmed/37383776 http://dx.doi.org/10.1016/j.dib.2023.109245 |
work_keys_str_mv | AT hosensabbir hqadataahistoricalquestionanswergenerationdatasetfrompreviousmultiperspectiveconversation AT evajannatulferdous hqadataahistoricalquestionanswergenerationdatasetfrompreviousmultiperspectiveconversation AT hasibayman hqadataahistoricalquestionanswergenerationdatasetfrompreviousmultiperspectiveconversation AT sahaalokekumar hqadataahistoricalquestionanswergenerationdatasetfrompreviousmultiperspectiveconversation AT mridhamf hqadataahistoricalquestionanswergenerationdatasetfrompreviousmultiperspectiveconversation AT wadudanwarhussen hqadataahistoricalquestionanswergenerationdatasetfrompreviousmultiperspectiveconversation |