Cargando…

DeepLontar dataset for handwritten Balinese character detection and syllable recognition on Lontar manuscript

The digitalization of traditional Palmyra manuscripts, such as Lontar, is the government’s main focus in efforts to preserve Balinese culture. Digitization is done by acquiring Lontar manuscripts through photos or scans. To understand Lontar’s contents, experts usually carry out transliteration. Aut...

Descripción completa

Detalles Bibliográficos
Autores principales: Siahaan, Daniel, Sutramiani, Ni Putu, Suciati, Nanik, Duija, I Nengah, Darma, I Wayan Agus Surya
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9741579/
https://www.ncbi.nlm.nih.gov/pubmed/36496448
http://dx.doi.org/10.1038/s41597-022-01867-5
Descripción
Sumario:The digitalization of traditional Palmyra manuscripts, such as Lontar, is the government’s main focus in efforts to preserve Balinese culture. Digitization is done by acquiring Lontar manuscripts through photos or scans. To understand Lontar’s contents, experts usually carry out transliteration. Automatic transliteration using computer vision is generally carried out in several stages: character detection, character recognition, syllable recognition, and word recognition. Many methods can be used for detection and recognition, but they need data to train and evaluate the resulting model. In compiling the dataset, the data needs to be processed and labelled. This paper presented data collection and building datasets for detection and recognition tasks. Lontar was collected from libraries at universities in Bali. Data generation was carried out to produce 400 augmented images from 200 Lontar original images to increase the variousness of data. Annotations were performed to label each character producing over 100,000 characters in 55 character classes. This dataset can be used to train and evaluate performance in character detection and syllable recognition of new manuscripts.