Cargando…

Synthesis of Common Arabic Handwritings to Aid Optical Character Recognition Research

Document analysis tasks such as pattern recognition, word spotting or segmentation, require comprehensive databases for training and validation. Not only variations in writing style but also the used list of words is of importance in the case that training samples should reflect the input of a speci...

Descripción completa

Detalles Bibliográficos
Autores principales: Dinges, Laslo, Al-Hamadi, Ayoub, Elzobi, Moftah, El-etriby, Sherif
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4813921/
https://www.ncbi.nlm.nih.gov/pubmed/26978368
http://dx.doi.org/10.3390/s16030346
_version_ 1782424345817120768
author Dinges, Laslo
Al-Hamadi, Ayoub
Elzobi, Moftah
El-etriby, Sherif
author_facet Dinges, Laslo
Al-Hamadi, Ayoub
Elzobi, Moftah
El-etriby, Sherif
author_sort Dinges, Laslo
collection PubMed
description Document analysis tasks such as pattern recognition, word spotting or segmentation, require comprehensive databases for training and validation. Not only variations in writing style but also the used list of words is of importance in the case that training samples should reflect the input of a specific area of application. However, generation of training samples is expensive in the sense of manpower and time, particularly if complete text pages including complex ground truth are required. This is why there is a lack of such databases, especially for Arabic, the second most popular language. However, Arabic handwriting recognition involves different preprocessing, segmentation and recognition methods. Each requires particular ground truth or samples to enable optimal training and validation, which are often not covered by the currently available databases. To overcome this issue, we propose a system that synthesizes Arabic handwritten words and text pages and generates corresponding detailed ground truth. We use these syntheses to validate a new, segmentation based system that recognizes handwritten Arabic words. We found that a modification of an Active Shape Model based character classifiers—that we proposed earlier—improves the word recognition accuracy. Further improvements are achieved, by using a vocabulary of the 50,000 most common Arabic words for error correction.
format Online
Article
Text
id pubmed-4813921
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-48139212016-04-06 Synthesis of Common Arabic Handwritings to Aid Optical Character Recognition Research Dinges, Laslo Al-Hamadi, Ayoub Elzobi, Moftah El-etriby, Sherif Sensors (Basel) Article Document analysis tasks such as pattern recognition, word spotting or segmentation, require comprehensive databases for training and validation. Not only variations in writing style but also the used list of words is of importance in the case that training samples should reflect the input of a specific area of application. However, generation of training samples is expensive in the sense of manpower and time, particularly if complete text pages including complex ground truth are required. This is why there is a lack of such databases, especially for Arabic, the second most popular language. However, Arabic handwriting recognition involves different preprocessing, segmentation and recognition methods. Each requires particular ground truth or samples to enable optimal training and validation, which are often not covered by the currently available databases. To overcome this issue, we propose a system that synthesizes Arabic handwritten words and text pages and generates corresponding detailed ground truth. We use these syntheses to validate a new, segmentation based system that recognizes handwritten Arabic words. We found that a modification of an Active Shape Model based character classifiers—that we proposed earlier—improves the word recognition accuracy. Further improvements are achieved, by using a vocabulary of the 50,000 most common Arabic words for error correction. MDPI 2016-03-11 /pmc/articles/PMC4813921/ /pubmed/26978368 http://dx.doi.org/10.3390/s16030346 Text en © 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons by Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Dinges, Laslo
Al-Hamadi, Ayoub
Elzobi, Moftah
El-etriby, Sherif
Synthesis of Common Arabic Handwritings to Aid Optical Character Recognition Research
title Synthesis of Common Arabic Handwritings to Aid Optical Character Recognition Research
title_full Synthesis of Common Arabic Handwritings to Aid Optical Character Recognition Research
title_fullStr Synthesis of Common Arabic Handwritings to Aid Optical Character Recognition Research
title_full_unstemmed Synthesis of Common Arabic Handwritings to Aid Optical Character Recognition Research
title_short Synthesis of Common Arabic Handwritings to Aid Optical Character Recognition Research
title_sort synthesis of common arabic handwritings to aid optical character recognition research
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4813921/
https://www.ncbi.nlm.nih.gov/pubmed/26978368
http://dx.doi.org/10.3390/s16030346
work_keys_str_mv AT dingeslaslo synthesisofcommonarabichandwritingstoaidopticalcharacterrecognitionresearch
AT alhamadiayoub synthesisofcommonarabichandwritingstoaidopticalcharacterrecognitionresearch
AT elzobimoftah synthesisofcommonarabichandwritingstoaidopticalcharacterrecognitionresearch
AT eletribysherif synthesisofcommonarabichandwritingstoaidopticalcharacterrecognitionresearch