Cargando…

UDDIPOK: A reading comprehension based question answering dataset in Bangla language

The popularity of reading comprehension (RC) is increasing day-to-day in Bangla Natural Language Processing (NLP) research area, both in machine learning and deep learning techniques. However, there is no original dataset from various sources in the Bangla language except translated from foreign RC...

Descripción completa

Detalles Bibliográficos
Autores principales: Aurpa, Tanjim Taharat, Ahmed, Md Shoaib, Rifat, Richita Khandakar, Anwar, Md. Musfique, Shawkat Ali, A.B.M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9929199/
https://www.ncbi.nlm.nih.gov/pubmed/36819905
http://dx.doi.org/10.1016/j.dib.2023.108933
Descripción
Sumario:The popularity of reading comprehension (RC) is increasing day-to-day in Bangla Natural Language Processing (NLP) research area, both in machine learning and deep learning techniques. However, there is no original dataset from various sources in the Bangla language except translated from foreign RC datasets, which contain abnormalities and mismatched translated data. In his paper, we present UDDIPOK, a novel wide-ranging, open-domain Bangla reading comprehension dataset. This dataset contains 270 reading passages, 3636 questions, and answers from diverse origins, for instance, textbooks, exam questions from middle and high schools, newspapers, etc. Furthermore, this dataset is formated in CSV, which contains three columns: passages, questions, and answers. As a result, data can be handled expeditiously and easily for any machine learning research.