Cargando…

ASM Based Synthesis of Handwritten Arabic Text Pages

Document analysis tasks, as text recognition, word spotting, or segmentation, are highly dependent on comprehensive and suitable databases for training and validation. However their generation is expensive in sense of labor and time. As a matter of fact, there is a lack of such databases, which comp...

Descripción completa

Detalles Bibliográficos
Autores principales: Dinges, Laslo, Al-Hamadi, Ayoub, Elzobi, Moftah, El-etriby, Sherif, Ghoneim, Ahmed
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4534626/
https://www.ncbi.nlm.nih.gov/pubmed/26295059
http://dx.doi.org/10.1155/2015/323575
_version_ 1782385484485361664
author Dinges, Laslo
Al-Hamadi, Ayoub
Elzobi, Moftah
El-etriby, Sherif
Ghoneim, Ahmed
author_facet Dinges, Laslo
Al-Hamadi, Ayoub
Elzobi, Moftah
El-etriby, Sherif
Ghoneim, Ahmed
author_sort Dinges, Laslo
collection PubMed
description Document analysis tasks, as text recognition, word spotting, or segmentation, are highly dependent on comprehensive and suitable databases for training and validation. However their generation is expensive in sense of labor and time. As a matter of fact, there is a lack of such databases, which complicates research and development. This is especially true for the case of Arabic handwriting recognition, that involves different preprocessing, segmentation, and recognition methods, which have individual demands on samples and ground truth. To bypass this problem, we present an efficient system that automatically turns Arabic Unicode text into synthetic images of handwritten documents and detailed ground truth. Active Shape Models (ASMs) based on 28046 online samples were used for character synthesis and statistical properties were extracted from the IESK-arDB database to simulate baselines and word slant or skew. In the synthesis step ASM based representations are composed to words and text pages, smoothed by B-Spline interpolation and rendered considering writing speed and pen characteristics. Finally, we use the synthetic data to validate a segmentation method. An experimental comparison with the IESK-arDB database encourages to train and test document analysis related methods on synthetic samples, whenever no sufficient natural ground truthed data is available.
format Online
Article
Text
id pubmed-4534626
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-45346262015-08-20 ASM Based Synthesis of Handwritten Arabic Text Pages Dinges, Laslo Al-Hamadi, Ayoub Elzobi, Moftah El-etriby, Sherif Ghoneim, Ahmed ScientificWorldJournal Research Article Document analysis tasks, as text recognition, word spotting, or segmentation, are highly dependent on comprehensive and suitable databases for training and validation. However their generation is expensive in sense of labor and time. As a matter of fact, there is a lack of such databases, which complicates research and development. This is especially true for the case of Arabic handwriting recognition, that involves different preprocessing, segmentation, and recognition methods, which have individual demands on samples and ground truth. To bypass this problem, we present an efficient system that automatically turns Arabic Unicode text into synthetic images of handwritten documents and detailed ground truth. Active Shape Models (ASMs) based on 28046 online samples were used for character synthesis and statistical properties were extracted from the IESK-arDB database to simulate baselines and word slant or skew. In the synthesis step ASM based representations are composed to words and text pages, smoothed by B-Spline interpolation and rendered considering writing speed and pen characteristics. Finally, we use the synthetic data to validate a segmentation method. An experimental comparison with the IESK-arDB database encourages to train and test document analysis related methods on synthetic samples, whenever no sufficient natural ground truthed data is available. Hindawi Publishing Corporation 2015 2015-07-30 /pmc/articles/PMC4534626/ /pubmed/26295059 http://dx.doi.org/10.1155/2015/323575 Text en Copyright © 2015 Laslo Dinges et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Dinges, Laslo
Al-Hamadi, Ayoub
Elzobi, Moftah
El-etriby, Sherif
Ghoneim, Ahmed
ASM Based Synthesis of Handwritten Arabic Text Pages
title ASM Based Synthesis of Handwritten Arabic Text Pages
title_full ASM Based Synthesis of Handwritten Arabic Text Pages
title_fullStr ASM Based Synthesis of Handwritten Arabic Text Pages
title_full_unstemmed ASM Based Synthesis of Handwritten Arabic Text Pages
title_short ASM Based Synthesis of Handwritten Arabic Text Pages
title_sort asm based synthesis of handwritten arabic text pages
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4534626/
https://www.ncbi.nlm.nih.gov/pubmed/26295059
http://dx.doi.org/10.1155/2015/323575
work_keys_str_mv AT dingeslaslo asmbasedsynthesisofhandwrittenarabictextpages
AT alhamadiayoub asmbasedsynthesisofhandwrittenarabictextpages
AT elzobimoftah asmbasedsynthesisofhandwrittenarabictextpages
AT eletribysherif asmbasedsynthesisofhandwrittenarabictextpages
AT ghoneimahmed asmbasedsynthesisofhandwrittenarabictextpages