Cargando…
A new dataset for mongolian online handwritten recognition
This paper introduces a new traditional Mongolian word-level online handwriting dataset, MOLHW. The dataset consists of handwritten Mongolian words, including 164,631 samples written by 200 writers and covering 40,605 Mongolian common words. These words were selected from a large Mongolian corpus. T...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9807600/ https://www.ncbi.nlm.nih.gov/pubmed/36593326 http://dx.doi.org/10.1038/s41598-022-27267-8 |
_version_ | 1784862756348887040 |
---|---|
author | Pan, Yuecai Fan, Daoerji Wu, Huijuan Teng, Da |
author_facet | Pan, Yuecai Fan, Daoerji Wu, Huijuan Teng, Da |
author_sort | Pan, Yuecai |
collection | PubMed |
description | This paper introduces a new traditional Mongolian word-level online handwriting dataset, MOLHW. The dataset consists of handwritten Mongolian words, including 164,631 samples written by 200 writers and covering 40,605 Mongolian common words. These words were selected from a large Mongolian corpus. The coordinate points of words were collected by volunteers, who wrote the corresponding words on the dedicated application for their mobile phones. Latin transliteration of Mongolian was used to annotate the coordinates of each word. At the same time, the writer’s identification number and mobile phone screen information were recorded in the dataset. Using this dataset, we propose an encoder–decoder Mongolian online handwriting recognition model with a deep bidirectional gated recurrent unit and attention mechanism as the baseline evaluation model. Under this model, the optimal performance of the word error rate (WER) on the test set was 24.281%. Furthermore, we present the experimental results of different Mongolian online handwriting recognition models. The experimental results show that compared with other models, the model based on Transformer could learn the corresponding character sequences from the coordinate data of the dataset more effectively, with a 16.969% WER on the test set. The dataset is now freely available to researchers worldwide. The dataset can be applied to handwritten text recognition as well as handwritten text generation, handwriting identification, and signature recognition. |
format | Online Article Text |
id | pubmed-9807600 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-98076002023-01-04 A new dataset for mongolian online handwritten recognition Pan, Yuecai Fan, Daoerji Wu, Huijuan Teng, Da Sci Rep Article This paper introduces a new traditional Mongolian word-level online handwriting dataset, MOLHW. The dataset consists of handwritten Mongolian words, including 164,631 samples written by 200 writers and covering 40,605 Mongolian common words. These words were selected from a large Mongolian corpus. The coordinate points of words were collected by volunteers, who wrote the corresponding words on the dedicated application for their mobile phones. Latin transliteration of Mongolian was used to annotate the coordinates of each word. At the same time, the writer’s identification number and mobile phone screen information were recorded in the dataset. Using this dataset, we propose an encoder–decoder Mongolian online handwriting recognition model with a deep bidirectional gated recurrent unit and attention mechanism as the baseline evaluation model. Under this model, the optimal performance of the word error rate (WER) on the test set was 24.281%. Furthermore, we present the experimental results of different Mongolian online handwriting recognition models. The experimental results show that compared with other models, the model based on Transformer could learn the corresponding character sequences from the coordinate data of the dataset more effectively, with a 16.969% WER on the test set. The dataset is now freely available to researchers worldwide. The dataset can be applied to handwritten text recognition as well as handwritten text generation, handwriting identification, and signature recognition. Nature Publishing Group UK 2023-01-02 /pmc/articles/PMC9807600/ /pubmed/36593326 http://dx.doi.org/10.1038/s41598-022-27267-8 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Pan, Yuecai Fan, Daoerji Wu, Huijuan Teng, Da A new dataset for mongolian online handwritten recognition |
title | A new dataset for mongolian online handwritten recognition |
title_full | A new dataset for mongolian online handwritten recognition |
title_fullStr | A new dataset for mongolian online handwritten recognition |
title_full_unstemmed | A new dataset for mongolian online handwritten recognition |
title_short | A new dataset for mongolian online handwritten recognition |
title_sort | new dataset for mongolian online handwritten recognition |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9807600/ https://www.ncbi.nlm.nih.gov/pubmed/36593326 http://dx.doi.org/10.1038/s41598-022-27267-8 |
work_keys_str_mv | AT panyuecai anewdatasetformongolianonlinehandwrittenrecognition AT fandaoerji anewdatasetformongolianonlinehandwrittenrecognition AT wuhuijuan anewdatasetformongolianonlinehandwrittenrecognition AT tengda anewdatasetformongolianonlinehandwrittenrecognition AT panyuecai newdatasetformongolianonlinehandwrittenrecognition AT fandaoerji newdatasetformongolianonlinehandwrittenrecognition AT wuhuijuan newdatasetformongolianonlinehandwrittenrecognition AT tengda newdatasetformongolianonlinehandwrittenrecognition |