Cargando…

Improved Two-Step Binarization of Degraded Document Images Based on Gaussian Mixture Model

Image binarization is one of the most relevant preprocessing operations influencing the results of further image analysis conducted for many purposes. During this step a significant loss of information occurs and the use of inappropriate thresholding methods may cause difficulties in further shape a...

Descripción completa

Detalles Bibliográficos
Autores principales: Krupiński, Robert, Lech, Piotr, Okarma, Krzysztof
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7302549/
http://dx.doi.org/10.1007/978-3-030-50426-7_35
_version_ 1783547868906258432
author Krupiński, Robert
Lech, Piotr
Okarma, Krzysztof
author_facet Krupiński, Robert
Lech, Piotr
Okarma, Krzysztof
author_sort Krupiński, Robert
collection PubMed
description Image binarization is one of the most relevant preprocessing operations influencing the results of further image analysis conducted for many purposes. During this step a significant loss of information occurs and the use of inappropriate thresholding methods may cause difficulties in further shape analysis or even make it impossible to recognize different shapes of objects or characters. Some of the most typical applications utilizing the analysis of binary images are Optical Character Recognition (OCR) and Optical Mark Recognition (OMR), which may also be applied for unevenly illuminated natural images, as well as for challenging degraded historical document images, considered as typical benchmarking tools for image binarization algorithms. To face the still valid challenge of relatively fast and simple, but robust binarization of degraded document images, a novel two-step algorithm utilizing initial thresholding, based on the modelling of the simplified image histogram using Gaussian Mixture Model (GMM) and the Monte Carlo method, is proposed in the paper. This approach can be considered as the extension of recently developed image preprocessing method utilizing Generalized Gaussian Distribution (GGD), based on the assumption of its similarity to the histograms of ground truth binary images distorted by Gaussian noise. The processing time of the first step, producing the intermediate images with partially removed background information, may be significantly reduced due to the use of the Monte Carlo method. The proposed improved approach leads to even better results, not only for well-known DIBCO benchmarking databases, but also for more demanding Bickley Diary dataset, allowing the use of some well-known classical binarization methods, including the global ones, in the second step of the algorithm.
format Online
Article
Text
id pubmed-7302549
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-73025492020-06-19 Improved Two-Step Binarization of Degraded Document Images Based on Gaussian Mixture Model Krupiński, Robert Lech, Piotr Okarma, Krzysztof Computational Science – ICCS 2020 Article Image binarization is one of the most relevant preprocessing operations influencing the results of further image analysis conducted for many purposes. During this step a significant loss of information occurs and the use of inappropriate thresholding methods may cause difficulties in further shape analysis or even make it impossible to recognize different shapes of objects or characters. Some of the most typical applications utilizing the analysis of binary images are Optical Character Recognition (OCR) and Optical Mark Recognition (OMR), which may also be applied for unevenly illuminated natural images, as well as for challenging degraded historical document images, considered as typical benchmarking tools for image binarization algorithms. To face the still valid challenge of relatively fast and simple, but robust binarization of degraded document images, a novel two-step algorithm utilizing initial thresholding, based on the modelling of the simplified image histogram using Gaussian Mixture Model (GMM) and the Monte Carlo method, is proposed in the paper. This approach can be considered as the extension of recently developed image preprocessing method utilizing Generalized Gaussian Distribution (GGD), based on the assumption of its similarity to the histograms of ground truth binary images distorted by Gaussian noise. The processing time of the first step, producing the intermediate images with partially removed background information, may be significantly reduced due to the use of the Monte Carlo method. The proposed improved approach leads to even better results, not only for well-known DIBCO benchmarking databases, but also for more demanding Bickley Diary dataset, allowing the use of some well-known classical binarization methods, including the global ones, in the second step of the algorithm. 2020-05-25 /pmc/articles/PMC7302549/ http://dx.doi.org/10.1007/978-3-030-50426-7_35 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Krupiński, Robert
Lech, Piotr
Okarma, Krzysztof
Improved Two-Step Binarization of Degraded Document Images Based on Gaussian Mixture Model
title Improved Two-Step Binarization of Degraded Document Images Based on Gaussian Mixture Model
title_full Improved Two-Step Binarization of Degraded Document Images Based on Gaussian Mixture Model
title_fullStr Improved Two-Step Binarization of Degraded Document Images Based on Gaussian Mixture Model
title_full_unstemmed Improved Two-Step Binarization of Degraded Document Images Based on Gaussian Mixture Model
title_short Improved Two-Step Binarization of Degraded Document Images Based on Gaussian Mixture Model
title_sort improved two-step binarization of degraded document images based on gaussian mixture model
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7302549/
http://dx.doi.org/10.1007/978-3-030-50426-7_35
work_keys_str_mv AT krupinskirobert improvedtwostepbinarizationofdegradeddocumentimagesbasedongaussianmixturemodel
AT lechpiotr improvedtwostepbinarizationofdegradeddocumentimagesbasedongaussianmixturemodel
AT okarmakrzysztof improvedtwostepbinarizationofdegradeddocumentimagesbasedongaussianmixturemodel