Cargando…

Vision Transformers in Image Restoration: A Survey

The Vision Transformer (ViT) architecture has been remarkably successful in image restoration. For a while, Convolutional Neural Networks (CNN) predominated in most computer vision tasks. Now, both CNN and ViT are efficient approaches that demonstrate powerful capabilities to restore a better versio...

Descripción completa

Detalles Bibliográficos
Autores principales: Ali, Anas M., Benjdira, Bilel, Koubaa, Anis, El-Shafai, Walid, Khan, Zahid, Boulila, Wadii
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10006889/
https://www.ncbi.nlm.nih.gov/pubmed/36904589
http://dx.doi.org/10.3390/s23052385
_version_ 1784905382631571456
author Ali, Anas M.
Benjdira, Bilel
Koubaa, Anis
El-Shafai, Walid
Khan, Zahid
Boulila, Wadii
author_facet Ali, Anas M.
Benjdira, Bilel
Koubaa, Anis
El-Shafai, Walid
Khan, Zahid
Boulila, Wadii
author_sort Ali, Anas M.
collection PubMed
description The Vision Transformer (ViT) architecture has been remarkably successful in image restoration. For a while, Convolutional Neural Networks (CNN) predominated in most computer vision tasks. Now, both CNN and ViT are efficient approaches that demonstrate powerful capabilities to restore a better version of an image given in a low-quality format. In this study, the efficiency of ViT in image restoration is studied extensively. The ViT architectures are classified for every task of image restoration. Seven image restoration tasks are considered: Image Super-Resolution, Image Denoising, General Image Enhancement, JPEG Compression Artifact Reduction, Image Deblurring, Removing Adverse Weather Conditions, and Image Dehazing. The outcomes, the advantages, the limitations, and the possible areas for future research are detailed. Overall, it is noted that incorporating ViT in the new architectures for image restoration is becoming a rule. This is due to some advantages compared to CNN, such as better efficiency, especially when more data are fed to the network, robustness in feature extraction, and a better feature learning approach that sees better the variances and characteristics of the input. Nevertheless, some drawbacks exist, such as the need for more data to show the benefits of ViT over CNN, the increased computational cost due to the complexity of the self-attention block, a more challenging training process, and the lack of interpretability. These drawbacks represent the future research direction that should be targeted to increase the efficiency of ViT in the image restoration domain.
format Online
Article
Text
id pubmed-10006889
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-100068892023-03-12 Vision Transformers in Image Restoration: A Survey Ali, Anas M. Benjdira, Bilel Koubaa, Anis El-Shafai, Walid Khan, Zahid Boulila, Wadii Sensors (Basel) Article The Vision Transformer (ViT) architecture has been remarkably successful in image restoration. For a while, Convolutional Neural Networks (CNN) predominated in most computer vision tasks. Now, both CNN and ViT are efficient approaches that demonstrate powerful capabilities to restore a better version of an image given in a low-quality format. In this study, the efficiency of ViT in image restoration is studied extensively. The ViT architectures are classified for every task of image restoration. Seven image restoration tasks are considered: Image Super-Resolution, Image Denoising, General Image Enhancement, JPEG Compression Artifact Reduction, Image Deblurring, Removing Adverse Weather Conditions, and Image Dehazing. The outcomes, the advantages, the limitations, and the possible areas for future research are detailed. Overall, it is noted that incorporating ViT in the new architectures for image restoration is becoming a rule. This is due to some advantages compared to CNN, such as better efficiency, especially when more data are fed to the network, robustness in feature extraction, and a better feature learning approach that sees better the variances and characteristics of the input. Nevertheless, some drawbacks exist, such as the need for more data to show the benefits of ViT over CNN, the increased computational cost due to the complexity of the self-attention block, a more challenging training process, and the lack of interpretability. These drawbacks represent the future research direction that should be targeted to increase the efficiency of ViT in the image restoration domain. MDPI 2023-02-21 /pmc/articles/PMC10006889/ /pubmed/36904589 http://dx.doi.org/10.3390/s23052385 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Ali, Anas M.
Benjdira, Bilel
Koubaa, Anis
El-Shafai, Walid
Khan, Zahid
Boulila, Wadii
Vision Transformers in Image Restoration: A Survey
title Vision Transformers in Image Restoration: A Survey
title_full Vision Transformers in Image Restoration: A Survey
title_fullStr Vision Transformers in Image Restoration: A Survey
title_full_unstemmed Vision Transformers in Image Restoration: A Survey
title_short Vision Transformers in Image Restoration: A Survey
title_sort vision transformers in image restoration: a survey
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10006889/
https://www.ncbi.nlm.nih.gov/pubmed/36904589
http://dx.doi.org/10.3390/s23052385
work_keys_str_mv AT alianasm visiontransformersinimagerestorationasurvey
AT benjdirabilel visiontransformersinimagerestorationasurvey
AT koubaaanis visiontransformersinimagerestorationasurvey
AT elshafaiwalid visiontransformersinimagerestorationasurvey
AT khanzahid visiontransformersinimagerestorationasurvey
AT boulilawadii visiontransformersinimagerestorationasurvey