Cargando…

BanglaWriting: A multi-purpose offline Bangla handwriting dataset

This article presents a Bangla handwriting dataset named BanglaWriting that contains single-page handwritings of 260 individuals of different personalities and ages. Each page includes bounding-boxes that bounds each word, along with the unicode representation of the writing. This dataset contains 2...

Descripción completa

Detalles Bibliográficos
Autores principales: Mridha, M.F., Ohi, Abu Quwsar, Ali, M. Ameer, Emon, Mazedul Islam, Kabir, Muhammad Mohsin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7744928/
https://www.ncbi.nlm.nih.gov/pubmed/33354607
http://dx.doi.org/10.1016/j.dib.2020.106633
_version_ 1783624513772060672
author Mridha, M.F.
Ohi, Abu Quwsar
Ali, M. Ameer
Emon, Mazedul Islam
Kabir, Muhammad Mohsin
author_facet Mridha, M.F.
Ohi, Abu Quwsar
Ali, M. Ameer
Emon, Mazedul Islam
Kabir, Muhammad Mohsin
author_sort Mridha, M.F.
collection PubMed
description This article presents a Bangla handwriting dataset named BanglaWriting that contains single-page handwritings of 260 individuals of different personalities and ages. Each page includes bounding-boxes that bounds each word, along with the unicode representation of the writing. This dataset contains 21,234 words and 32,787 characters in total. Moreover, this dataset includes 5,470 unique words of Bangla vocabulary. Apart from the usual words, the dataset comprises 261 comprehensible overwriting and 450 handwritten strikes and mistakes. All of the bounding-boxes and word labels are manually-generated. The dataset can be used for complex optical character/word recognition, writer identification, handwritten word segmentation, and word generation. Furthermore, this dataset is suitable for extracting age-based and gender-based variation of handwriting.
format Online
Article
Text
id pubmed-7744928
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-77449282020-12-21 BanglaWriting: A multi-purpose offline Bangla handwriting dataset Mridha, M.F. Ohi, Abu Quwsar Ali, M. Ameer Emon, Mazedul Islam Kabir, Muhammad Mohsin Data Brief Data Article This article presents a Bangla handwriting dataset named BanglaWriting that contains single-page handwritings of 260 individuals of different personalities and ages. Each page includes bounding-boxes that bounds each word, along with the unicode representation of the writing. This dataset contains 21,234 words and 32,787 characters in total. Moreover, this dataset includes 5,470 unique words of Bangla vocabulary. Apart from the usual words, the dataset comprises 261 comprehensible overwriting and 450 handwritten strikes and mistakes. All of the bounding-boxes and word labels are manually-generated. The dataset can be used for complex optical character/word recognition, writer identification, handwritten word segmentation, and word generation. Furthermore, this dataset is suitable for extracting age-based and gender-based variation of handwriting. Elsevier 2020-12-09 /pmc/articles/PMC7744928/ /pubmed/33354607 http://dx.doi.org/10.1016/j.dib.2020.106633 Text en © 2020 The Author(s) http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Data Article
Mridha, M.F.
Ohi, Abu Quwsar
Ali, M. Ameer
Emon, Mazedul Islam
Kabir, Muhammad Mohsin
BanglaWriting: A multi-purpose offline Bangla handwriting dataset
title BanglaWriting: A multi-purpose offline Bangla handwriting dataset
title_full BanglaWriting: A multi-purpose offline Bangla handwriting dataset
title_fullStr BanglaWriting: A multi-purpose offline Bangla handwriting dataset
title_full_unstemmed BanglaWriting: A multi-purpose offline Bangla handwriting dataset
title_short BanglaWriting: A multi-purpose offline Bangla handwriting dataset
title_sort banglawriting: a multi-purpose offline bangla handwriting dataset
topic Data Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7744928/
https://www.ncbi.nlm.nih.gov/pubmed/33354607
http://dx.doi.org/10.1016/j.dib.2020.106633
work_keys_str_mv AT mridhamf banglawritingamultipurposeofflinebanglahandwritingdataset
AT ohiabuquwsar banglawritingamultipurposeofflinebanglahandwritingdataset
AT alimameer banglawritingamultipurposeofflinebanglahandwritingdataset
AT emonmazedulislam banglawritingamultipurposeofflinebanglahandwritingdataset
AT kabirmuhammadmohsin banglawritingamultipurposeofflinebanglahandwritingdataset