Cargando…

Deep learning for terahertz image denoising in nondestructive historical document analysis

Historical documents contain essential information about the past, including places, people, or events. Many of these valuable cultural artifacts cannot be further examined due to aging or external influences, as they are too fragile to be opened or turned over, so their rich contents remain hidden....

Descripción completa

Detalles Bibliográficos
Autores principales: Dutta, Balaka, Root, Konstantin, Ullmann, Ingrid, Wagner, Fabian, Mayr, Martin, Seuret, Mathias, Thies, Mareike, Stromer, Daniel, Christlein, Vincent, Schür, Jan, Maier, Andreas, Huang, Yixing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9800433/
https://www.ncbi.nlm.nih.gov/pubmed/36581647
http://dx.doi.org/10.1038/s41598-022-26957-7
_version_ 1784861295613313024
author Dutta, Balaka
Root, Konstantin
Ullmann, Ingrid
Wagner, Fabian
Mayr, Martin
Seuret, Mathias
Thies, Mareike
Stromer, Daniel
Christlein, Vincent
Schür, Jan
Maier, Andreas
Huang, Yixing
author_facet Dutta, Balaka
Root, Konstantin
Ullmann, Ingrid
Wagner, Fabian
Mayr, Martin
Seuret, Mathias
Thies, Mareike
Stromer, Daniel
Christlein, Vincent
Schür, Jan
Maier, Andreas
Huang, Yixing
author_sort Dutta, Balaka
collection PubMed
description Historical documents contain essential information about the past, including places, people, or events. Many of these valuable cultural artifacts cannot be further examined due to aging or external influences, as they are too fragile to be opened or turned over, so their rich contents remain hidden. Terahertz (THz) imaging is a nondestructive 3D imaging technique that can be used to reveal the hidden contents without damaging the documents. As noise or imaging artifacts are predominantly present in reconstructed images processed by standard THz reconstruction algorithms, this work intends to improve THz image quality with deep learning. To overcome the data scarcity problem in training a supervised deep learning model, an unsupervised deep learning network (CycleGAN) is first applied to generate paired noisy THz images from clean images (clean images are generated by a handwriting generator). With such synthetic noisy-to-clean paired images, a supervised deep learning model using Pix2pixGAN is trained, which is effective to enhance real noisy THz images. After Pix2pixGAN denoising, 99% characters written on one-side of the Xuan paper can be clearly recognized, while 61% characters written on one-side of the standard paper are sufficiently recognized. The average perceptual indices of Pix2pixGAN processed images are 16.83, which is very close to the average perceptual index 16.19 of clean handwriting images. Our work has important value for THz-imaging-based nondestructive historical document analysis.
format Online
Article
Text
id pubmed-9800433
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-98004332022-12-31 Deep learning for terahertz image denoising in nondestructive historical document analysis Dutta, Balaka Root, Konstantin Ullmann, Ingrid Wagner, Fabian Mayr, Martin Seuret, Mathias Thies, Mareike Stromer, Daniel Christlein, Vincent Schür, Jan Maier, Andreas Huang, Yixing Sci Rep Article Historical documents contain essential information about the past, including places, people, or events. Many of these valuable cultural artifacts cannot be further examined due to aging or external influences, as they are too fragile to be opened or turned over, so their rich contents remain hidden. Terahertz (THz) imaging is a nondestructive 3D imaging technique that can be used to reveal the hidden contents without damaging the documents. As noise or imaging artifacts are predominantly present in reconstructed images processed by standard THz reconstruction algorithms, this work intends to improve THz image quality with deep learning. To overcome the data scarcity problem in training a supervised deep learning model, an unsupervised deep learning network (CycleGAN) is first applied to generate paired noisy THz images from clean images (clean images are generated by a handwriting generator). With such synthetic noisy-to-clean paired images, a supervised deep learning model using Pix2pixGAN is trained, which is effective to enhance real noisy THz images. After Pix2pixGAN denoising, 99% characters written on one-side of the Xuan paper can be clearly recognized, while 61% characters written on one-side of the standard paper are sufficiently recognized. The average perceptual indices of Pix2pixGAN processed images are 16.83, which is very close to the average perceptual index 16.19 of clean handwriting images. Our work has important value for THz-imaging-based nondestructive historical document analysis. Nature Publishing Group UK 2022-12-29 /pmc/articles/PMC9800433/ /pubmed/36581647 http://dx.doi.org/10.1038/s41598-022-26957-7 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Dutta, Balaka
Root, Konstantin
Ullmann, Ingrid
Wagner, Fabian
Mayr, Martin
Seuret, Mathias
Thies, Mareike
Stromer, Daniel
Christlein, Vincent
Schür, Jan
Maier, Andreas
Huang, Yixing
Deep learning for terahertz image denoising in nondestructive historical document analysis
title Deep learning for terahertz image denoising in nondestructive historical document analysis
title_full Deep learning for terahertz image denoising in nondestructive historical document analysis
title_fullStr Deep learning for terahertz image denoising in nondestructive historical document analysis
title_full_unstemmed Deep learning for terahertz image denoising in nondestructive historical document analysis
title_short Deep learning for terahertz image denoising in nondestructive historical document analysis
title_sort deep learning for terahertz image denoising in nondestructive historical document analysis
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9800433/
https://www.ncbi.nlm.nih.gov/pubmed/36581647
http://dx.doi.org/10.1038/s41598-022-26957-7
work_keys_str_mv AT duttabalaka deeplearningforterahertzimagedenoisinginnondestructivehistoricaldocumentanalysis
AT rootkonstantin deeplearningforterahertzimagedenoisinginnondestructivehistoricaldocumentanalysis
AT ullmanningrid deeplearningforterahertzimagedenoisinginnondestructivehistoricaldocumentanalysis
AT wagnerfabian deeplearningforterahertzimagedenoisinginnondestructivehistoricaldocumentanalysis
AT mayrmartin deeplearningforterahertzimagedenoisinginnondestructivehistoricaldocumentanalysis
AT seuretmathias deeplearningforterahertzimagedenoisinginnondestructivehistoricaldocumentanalysis
AT thiesmareike deeplearningforterahertzimagedenoisinginnondestructivehistoricaldocumentanalysis
AT stromerdaniel deeplearningforterahertzimagedenoisinginnondestructivehistoricaldocumentanalysis
AT christleinvincent deeplearningforterahertzimagedenoisinginnondestructivehistoricaldocumentanalysis
AT schurjan deeplearningforterahertzimagedenoisinginnondestructivehistoricaldocumentanalysis
AT maierandreas deeplearningforterahertzimagedenoisinginnondestructivehistoricaldocumentanalysis
AT huangyixing deeplearningforterahertzimagedenoisinginnondestructivehistoricaldocumentanalysis