Cargando…

BTSD: A curated transformation of sentence dataset for text classification in Bangla language

The Bangla Transformation of Sentence Classification dataset addresses the resource gap in natural language processing (NLP) for the Bangla language by providing a curated resource for Bangla sentence classification. With 3,793 annotated sentences, the dataset focuses on categorizing Bangla sentence...

Descripción completa

Detalles Bibliográficos
Autores principales:	Das, Rajesh Kumar, Islam, Mirajul, Khushbu, Sharun Akter
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2023
Materias:	Data Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10415831/ https://www.ncbi.nlm.nih.gov/pubmed/37577411 http://dx.doi.org/10.1016/j.dib.2023.109445

_version_	1785087633284661248
author	Das, Rajesh Kumar Islam, Mirajul Khushbu, Sharun Akter
author_facet	Das, Rajesh Kumar Islam, Mirajul Khushbu, Sharun Akter
author_sort	Das, Rajesh Kumar
collection	PubMed
description	The Bangla Transformation of Sentence Classification dataset addresses the resource gap in natural language processing (NLP) for the Bangla language by providing a curated resource for Bangla sentence classification. With 3,793 annotated sentences, the dataset focuses on categorizing Bangla sentences into Simple, Complex, and Compound classes. It serves as a benchmark for evaluating NLP models on Bangla sentence classification, promoting linguistic diversity and inclusive language models. Collected from publicly accessible Facebook pages, the dataset ensures balanced representation across the categories. Preprocessing steps, including anonymization and duplicate removal, were applied. Three native Bangla speakers independently assessed the Transformation of Sentence labels, enhancing the dataset's reliability. The dataset empowers researchers, practitioners, and developers to build accurate and robust NLP models tailored to the Bangla language. It offers insights into Bangla syntax and structure, benefiting linguistic research. The dataset can be used to train models, uncover patterns in Bangla language usage, and develop effective NLP applications across domains.
format	Online Article Text
id	pubmed-10415831
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-104158312023-08-12 BTSD: A curated transformation of sentence dataset for text classification in Bangla language Das, Rajesh Kumar Islam, Mirajul Khushbu, Sharun Akter Data Brief Data Article The Bangla Transformation of Sentence Classification dataset addresses the resource gap in natural language processing (NLP) for the Bangla language by providing a curated resource for Bangla sentence classification. With 3,793 annotated sentences, the dataset focuses on categorizing Bangla sentences into Simple, Complex, and Compound classes. It serves as a benchmark for evaluating NLP models on Bangla sentence classification, promoting linguistic diversity and inclusive language models. Collected from publicly accessible Facebook pages, the dataset ensures balanced representation across the categories. Preprocessing steps, including anonymization and duplicate removal, were applied. Three native Bangla speakers independently assessed the Transformation of Sentence labels, enhancing the dataset's reliability. The dataset empowers researchers, practitioners, and developers to build accurate and robust NLP models tailored to the Bangla language. It offers insights into Bangla syntax and structure, benefiting linguistic research. The dataset can be used to train models, uncover patterns in Bangla language usage, and develop effective NLP applications across domains. Elsevier 2023-07-24 /pmc/articles/PMC10415831/ /pubmed/37577411 http://dx.doi.org/10.1016/j.dib.2023.109445 Text en © 2023 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Data Article Das, Rajesh Kumar Islam, Mirajul Khushbu, Sharun Akter BTSD: A curated transformation of sentence dataset for text classification in Bangla language
title	BTSD: A curated transformation of sentence dataset for text classification in Bangla language
title_full	BTSD: A curated transformation of sentence dataset for text classification in Bangla language
title_fullStr	BTSD: A curated transformation of sentence dataset for text classification in Bangla language
title_full_unstemmed	BTSD: A curated transformation of sentence dataset for text classification in Bangla language
title_short	BTSD: A curated transformation of sentence dataset for text classification in Bangla language
title_sort	btsd: a curated transformation of sentence dataset for text classification in bangla language
topic	Data Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10415831/ https://www.ncbi.nlm.nih.gov/pubmed/37577411 http://dx.doi.org/10.1016/j.dib.2023.109445
work_keys_str_mv	AT dasrajeshkumar btsdacuratedtransformationofsentencedatasetfortextclassificationinbanglalanguage AT islammirajul btsdacuratedtransformationofsentencedatasetfortextclassificationinbanglalanguage AT khushbusharunakter btsdacuratedtransformationofsentencedatasetfortextclassificationinbanglalanguage

BTSD: A curated transformation of sentence dataset for text classification in Bangla language

Ejemplares similares