Cargando…

Urdu Nasta’liq text recognition using implicit segmentation based on multi-dimensional long short term memory neural networks

The recognition of Arabic script and its derivatives such as Urdu, Persian, Pashto etc. is a difficult task due to complexity of this script. Particularly, Urdu text recognition is more difficult due to its Nasta’liq writing style. Nasta’liq writing style inherits complex calligraphic nature, which...

Descripción completa

Detalles Bibliográficos
Autores principales: Naz, Saeeda, Umar, Arif Iqbal, Ahmed, Riaz, Razzak, Muhammad Imran, Rashid, Sheikh Faisal, Shafait, Faisal
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5122597/
https://www.ncbi.nlm.nih.gov/pubmed/27942426
http://dx.doi.org/10.1186/s40064-016-3442-4
_version_ 1782469609018884096
author Naz, Saeeda
Umar, Arif Iqbal
Ahmed, Riaz
Razzak, Muhammad Imran
Rashid, Sheikh Faisal
Shafait, Faisal
author_facet Naz, Saeeda
Umar, Arif Iqbal
Ahmed, Riaz
Razzak, Muhammad Imran
Rashid, Sheikh Faisal
Shafait, Faisal
author_sort Naz, Saeeda
collection PubMed
description The recognition of Arabic script and its derivatives such as Urdu, Persian, Pashto etc. is a difficult task due to complexity of this script. Particularly, Urdu text recognition is more difficult due to its Nasta’liq writing style. Nasta’liq writing style inherits complex calligraphic nature, which presents major issues to recognition of Urdu text owing to diagonality in writing, high cursiveness, context sensitivity and overlapping of characters. Therefore, the work done for recognition of Arabic script cannot be directly applied to Urdu recognition. We present Multi-dimensional Long Short Term Memory (MDLSTM) Recurrent Neural Networks with an output layer designed for sequence labeling for recognition of printed Urdu text-lines written in the Nasta’liq writing style. Experiments show that MDLSTM attained a recognition accuracy of 98% for the unconstrained Urdu Nasta’liq printed text, which significantly outperforms the state-of-the-art techniques.
format Online
Article
Text
id pubmed-5122597
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-51225972016-12-09 Urdu Nasta’liq text recognition using implicit segmentation based on multi-dimensional long short term memory neural networks Naz, Saeeda Umar, Arif Iqbal Ahmed, Riaz Razzak, Muhammad Imran Rashid, Sheikh Faisal Shafait, Faisal Springerplus Research The recognition of Arabic script and its derivatives such as Urdu, Persian, Pashto etc. is a difficult task due to complexity of this script. Particularly, Urdu text recognition is more difficult due to its Nasta’liq writing style. Nasta’liq writing style inherits complex calligraphic nature, which presents major issues to recognition of Urdu text owing to diagonality in writing, high cursiveness, context sensitivity and overlapping of characters. Therefore, the work done for recognition of Arabic script cannot be directly applied to Urdu recognition. We present Multi-dimensional Long Short Term Memory (MDLSTM) Recurrent Neural Networks with an output layer designed for sequence labeling for recognition of printed Urdu text-lines written in the Nasta’liq writing style. Experiments show that MDLSTM attained a recognition accuracy of 98% for the unconstrained Urdu Nasta’liq printed text, which significantly outperforms the state-of-the-art techniques. Springer International Publishing 2016-11-25 /pmc/articles/PMC5122597/ /pubmed/27942426 http://dx.doi.org/10.1186/s40064-016-3442-4 Text en © The Author(s) 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle Research
Naz, Saeeda
Umar, Arif Iqbal
Ahmed, Riaz
Razzak, Muhammad Imran
Rashid, Sheikh Faisal
Shafait, Faisal
Urdu Nasta’liq text recognition using implicit segmentation based on multi-dimensional long short term memory neural networks
title Urdu Nasta’liq text recognition using implicit segmentation based on multi-dimensional long short term memory neural networks
title_full Urdu Nasta’liq text recognition using implicit segmentation based on multi-dimensional long short term memory neural networks
title_fullStr Urdu Nasta’liq text recognition using implicit segmentation based on multi-dimensional long short term memory neural networks
title_full_unstemmed Urdu Nasta’liq text recognition using implicit segmentation based on multi-dimensional long short term memory neural networks
title_short Urdu Nasta’liq text recognition using implicit segmentation based on multi-dimensional long short term memory neural networks
title_sort urdu nasta’liq text recognition using implicit segmentation based on multi-dimensional long short term memory neural networks
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5122597/
https://www.ncbi.nlm.nih.gov/pubmed/27942426
http://dx.doi.org/10.1186/s40064-016-3442-4
work_keys_str_mv AT nazsaeeda urdunastaliqtextrecognitionusingimplicitsegmentationbasedonmultidimensionallongshorttermmemoryneuralnetworks
AT umararifiqbal urdunastaliqtextrecognitionusingimplicitsegmentationbasedonmultidimensionallongshorttermmemoryneuralnetworks
AT ahmedriaz urdunastaliqtextrecognitionusingimplicitsegmentationbasedonmultidimensionallongshorttermmemoryneuralnetworks
AT razzakmuhammadimran urdunastaliqtextrecognitionusingimplicitsegmentationbasedonmultidimensionallongshorttermmemoryneuralnetworks
AT rashidsheikhfaisal urdunastaliqtextrecognitionusingimplicitsegmentationbasedonmultidimensionallongshorttermmemoryneuralnetworks
AT shafaitfaisal urdunastaliqtextrecognitionusingimplicitsegmentationbasedonmultidimensionallongshorttermmemoryneuralnetworks