Cargando…

UDDIPOK: A reading comprehension based question answering dataset in Bangla language

The popularity of reading comprehension (RC) is increasing day-to-day in Bangla Natural Language Processing (NLP) research area, both in machine learning and deep learning techniques. However, there is no original dataset from various sources in the Bangla language except translated from foreign RC...

Descripción completa

Detalles Bibliográficos
Autores principales: Aurpa, Tanjim Taharat, Ahmed, Md Shoaib, Rifat, Richita Khandakar, Anwar, Md. Musfique, Shawkat Ali, A.B.M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9929199/
https://www.ncbi.nlm.nih.gov/pubmed/36819905
http://dx.doi.org/10.1016/j.dib.2023.108933
_version_ 1784888797612212224
author Aurpa, Tanjim Taharat
Ahmed, Md Shoaib
Rifat, Richita Khandakar
Anwar, Md. Musfique
Shawkat Ali, A.B.M.
author_facet Aurpa, Tanjim Taharat
Ahmed, Md Shoaib
Rifat, Richita Khandakar
Anwar, Md. Musfique
Shawkat Ali, A.B.M.
author_sort Aurpa, Tanjim Taharat
collection PubMed
description The popularity of reading comprehension (RC) is increasing day-to-day in Bangla Natural Language Processing (NLP) research area, both in machine learning and deep learning techniques. However, there is no original dataset from various sources in the Bangla language except translated from foreign RC datasets, which contain abnormalities and mismatched translated data. In his paper, we present UDDIPOK, a novel wide-ranging, open-domain Bangla reading comprehension dataset. This dataset contains 270 reading passages, 3636 questions, and answers from diverse origins, for instance, textbooks, exam questions from middle and high schools, newspapers, etc. Furthermore, this dataset is formated in CSV, which contains three columns: passages, questions, and answers. As a result, data can be handled expeditiously and easily for any machine learning research.
format Online
Article
Text
id pubmed-9929199
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-99291992023-02-16 UDDIPOK: A reading comprehension based question answering dataset in Bangla language Aurpa, Tanjim Taharat Ahmed, Md Shoaib Rifat, Richita Khandakar Anwar, Md. Musfique Shawkat Ali, A.B.M. Data Brief Data Article The popularity of reading comprehension (RC) is increasing day-to-day in Bangla Natural Language Processing (NLP) research area, both in machine learning and deep learning techniques. However, there is no original dataset from various sources in the Bangla language except translated from foreign RC datasets, which contain abnormalities and mismatched translated data. In his paper, we present UDDIPOK, a novel wide-ranging, open-domain Bangla reading comprehension dataset. This dataset contains 270 reading passages, 3636 questions, and answers from diverse origins, for instance, textbooks, exam questions from middle and high schools, newspapers, etc. Furthermore, this dataset is formated in CSV, which contains three columns: passages, questions, and answers. As a result, data can be handled expeditiously and easily for any machine learning research. Elsevier 2023-02-02 /pmc/articles/PMC9929199/ /pubmed/36819905 http://dx.doi.org/10.1016/j.dib.2023.108933 Text en © 2023 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Data Article
Aurpa, Tanjim Taharat
Ahmed, Md Shoaib
Rifat, Richita Khandakar
Anwar, Md. Musfique
Shawkat Ali, A.B.M.
UDDIPOK: A reading comprehension based question answering dataset in Bangla language
title UDDIPOK: A reading comprehension based question answering dataset in Bangla language
title_full UDDIPOK: A reading comprehension based question answering dataset in Bangla language
title_fullStr UDDIPOK: A reading comprehension based question answering dataset in Bangla language
title_full_unstemmed UDDIPOK: A reading comprehension based question answering dataset in Bangla language
title_short UDDIPOK: A reading comprehension based question answering dataset in Bangla language
title_sort uddipok: a reading comprehension based question answering dataset in bangla language
topic Data Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9929199/
https://www.ncbi.nlm.nih.gov/pubmed/36819905
http://dx.doi.org/10.1016/j.dib.2023.108933
work_keys_str_mv AT aurpatanjimtaharat uddipokareadingcomprehensionbasedquestionansweringdatasetinbanglalanguage
AT ahmedmdshoaib uddipokareadingcomprehensionbasedquestionansweringdatasetinbanglalanguage
AT rifatrichitakhandakar uddipokareadingcomprehensionbasedquestionansweringdatasetinbanglalanguage
AT anwarmdmusfique uddipokareadingcomprehensionbasedquestionansweringdatasetinbanglalanguage
AT shawkataliabm uddipokareadingcomprehensionbasedquestionansweringdatasetinbanglalanguage