Cargando…

A Transformer-Based Model for Super-Resolution of Anime Image

Image super-resolution (ISR) technology aims to enhance resolution and improve image quality. It is widely applied to various real-world applications related to image processing, especially in medical images, while relatively little appliedto anime image production. Furthermore, contemporary ISR too...

Descripción completa

Detalles Bibliográficos
Autores principales:	Xu, Shizhuo, Dutta, Vibekananda, He, Xin, Matsumaru, Takafumi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9657210/ https://www.ncbi.nlm.nih.gov/pubmed/36365830 http://dx.doi.org/10.3390/s22218126

_version_	1784829634950463488
author	Xu, Shizhuo Dutta, Vibekananda He, Xin Matsumaru, Takafumi
author_facet	Xu, Shizhuo Dutta, Vibekananda He, Xin Matsumaru, Takafumi
author_sort	Xu, Shizhuo
collection	PubMed
description	Image super-resolution (ISR) technology aims to enhance resolution and improve image quality. It is widely applied to various real-world applications related to image processing, especially in medical images, while relatively little appliedto anime image production. Furthermore, contemporary ISR tools are often based on convolutional neural networks (CNNs), while few methods attempt to use transformers that perform well in other advanced vision tasks. We propose a so-called anime image super-resolution (AISR) method based on the Swin Transformer in this work. The work was carried out in several stages. First, a shallow feature extraction approach was employed to facilitate the features map of the input image’s low-frequency information, which mainly approximates the distribution of detailed information in a spatial structure (shallow feature). Next, we applied deep feature extraction to extract the image semantic information (deep feature). Finally, the image reconstruction method combines shallow and deep features to upsample the feature size and performs sub-pixel convolution to obtain many feature map channels. The novelty of the proposal is the enhancement of the low-frequency information using a Gaussian filter and the introduction of different window sizes to replace the patch merging operations in the Swin Transformer. A high-quality anime dataset was constructed to curb the effects of the model robustness on the online regime. We trained our model on this dataset and tested the model quality. We implement anime image super-resolution tasks at different magnifications (2×, 4×, 8×). The results were compared numerically and graphically with those delivered by conventional convolutional neural network-based and transformer-based methods. We demonstrate the experiments numerically using standard peak signal-to-noise ratio (PSNR) and structural similarity (SSIM), respectively. The series of experiments and ablation study showcase that our proposal outperforms others.
format	Online Article Text
id	pubmed-9657210
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-96572102022-11-15 A Transformer-Based Model for Super-Resolution of Anime Image Xu, Shizhuo Dutta, Vibekananda He, Xin Matsumaru, Takafumi Sensors (Basel) Article Image super-resolution (ISR) technology aims to enhance resolution and improve image quality. It is widely applied to various real-world applications related to image processing, especially in medical images, while relatively little appliedto anime image production. Furthermore, contemporary ISR tools are often based on convolutional neural networks (CNNs), while few methods attempt to use transformers that perform well in other advanced vision tasks. We propose a so-called anime image super-resolution (AISR) method based on the Swin Transformer in this work. The work was carried out in several stages. First, a shallow feature extraction approach was employed to facilitate the features map of the input image’s low-frequency information, which mainly approximates the distribution of detailed information in a spatial structure (shallow feature). Next, we applied deep feature extraction to extract the image semantic information (deep feature). Finally, the image reconstruction method combines shallow and deep features to upsample the feature size and performs sub-pixel convolution to obtain many feature map channels. The novelty of the proposal is the enhancement of the low-frequency information using a Gaussian filter and the introduction of different window sizes to replace the patch merging operations in the Swin Transformer. A high-quality anime dataset was constructed to curb the effects of the model robustness on the online regime. We trained our model on this dataset and tested the model quality. We implement anime image super-resolution tasks at different magnifications (2×, 4×, 8×). The results were compared numerically and graphically with those delivered by conventional convolutional neural network-based and transformer-based methods. We demonstrate the experiments numerically using standard peak signal-to-noise ratio (PSNR) and structural similarity (SSIM), respectively. The series of experiments and ablation study showcase that our proposal outperforms others. MDPI 2022-10-24 /pmc/articles/PMC9657210/ /pubmed/36365830 http://dx.doi.org/10.3390/s22218126 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Xu, Shizhuo Dutta, Vibekananda He, Xin Matsumaru, Takafumi A Transformer-Based Model for Super-Resolution of Anime Image
title	A Transformer-Based Model for Super-Resolution of Anime Image
title_full	A Transformer-Based Model for Super-Resolution of Anime Image
title_fullStr	A Transformer-Based Model for Super-Resolution of Anime Image
title_full_unstemmed	A Transformer-Based Model for Super-Resolution of Anime Image
title_short	A Transformer-Based Model for Super-Resolution of Anime Image
title_sort	transformer-based model for super-resolution of anime image
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9657210/ https://www.ncbi.nlm.nih.gov/pubmed/36365830 http://dx.doi.org/10.3390/s22218126
work_keys_str_mv	AT xushizhuo atransformerbasedmodelforsuperresolutionofanimeimage AT duttavibekananda atransformerbasedmodelforsuperresolutionofanimeimage AT hexin atransformerbasedmodelforsuperresolutionofanimeimage AT matsumarutakafumi atransformerbasedmodelforsuperresolutionofanimeimage AT xushizhuo transformerbasedmodelforsuperresolutionofanimeimage AT duttavibekananda transformerbasedmodelforsuperresolutionofanimeimage AT hexin transformerbasedmodelforsuperresolutionofanimeimage AT matsumarutakafumi transformerbasedmodelforsuperresolutionofanimeimage

A Transformer-Based Model for Super-Resolution of Anime Image

Ejemplares similares