Cargando…
Vision Transformers in Image Restoration: A Survey
The Vision Transformer (ViT) architecture has been remarkably successful in image restoration. For a while, Convolutional Neural Networks (CNN) predominated in most computer vision tasks. Now, both CNN and ViT are efficient approaches that demonstrate powerful capabilities to restore a better versio...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10006889/ https://www.ncbi.nlm.nih.gov/pubmed/36904589 http://dx.doi.org/10.3390/s23052385 |
_version_ | 1784905382631571456 |
---|---|
author | Ali, Anas M. Benjdira, Bilel Koubaa, Anis El-Shafai, Walid Khan, Zahid Boulila, Wadii |
author_facet | Ali, Anas M. Benjdira, Bilel Koubaa, Anis El-Shafai, Walid Khan, Zahid Boulila, Wadii |
author_sort | Ali, Anas M. |
collection | PubMed |
description | The Vision Transformer (ViT) architecture has been remarkably successful in image restoration. For a while, Convolutional Neural Networks (CNN) predominated in most computer vision tasks. Now, both CNN and ViT are efficient approaches that demonstrate powerful capabilities to restore a better version of an image given in a low-quality format. In this study, the efficiency of ViT in image restoration is studied extensively. The ViT architectures are classified for every task of image restoration. Seven image restoration tasks are considered: Image Super-Resolution, Image Denoising, General Image Enhancement, JPEG Compression Artifact Reduction, Image Deblurring, Removing Adverse Weather Conditions, and Image Dehazing. The outcomes, the advantages, the limitations, and the possible areas for future research are detailed. Overall, it is noted that incorporating ViT in the new architectures for image restoration is becoming a rule. This is due to some advantages compared to CNN, such as better efficiency, especially when more data are fed to the network, robustness in feature extraction, and a better feature learning approach that sees better the variances and characteristics of the input. Nevertheless, some drawbacks exist, such as the need for more data to show the benefits of ViT over CNN, the increased computational cost due to the complexity of the self-attention block, a more challenging training process, and the lack of interpretability. These drawbacks represent the future research direction that should be targeted to increase the efficiency of ViT in the image restoration domain. |
format | Online Article Text |
id | pubmed-10006889 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-100068892023-03-12 Vision Transformers in Image Restoration: A Survey Ali, Anas M. Benjdira, Bilel Koubaa, Anis El-Shafai, Walid Khan, Zahid Boulila, Wadii Sensors (Basel) Article The Vision Transformer (ViT) architecture has been remarkably successful in image restoration. For a while, Convolutional Neural Networks (CNN) predominated in most computer vision tasks. Now, both CNN and ViT are efficient approaches that demonstrate powerful capabilities to restore a better version of an image given in a low-quality format. In this study, the efficiency of ViT in image restoration is studied extensively. The ViT architectures are classified for every task of image restoration. Seven image restoration tasks are considered: Image Super-Resolution, Image Denoising, General Image Enhancement, JPEG Compression Artifact Reduction, Image Deblurring, Removing Adverse Weather Conditions, and Image Dehazing. The outcomes, the advantages, the limitations, and the possible areas for future research are detailed. Overall, it is noted that incorporating ViT in the new architectures for image restoration is becoming a rule. This is due to some advantages compared to CNN, such as better efficiency, especially when more data are fed to the network, robustness in feature extraction, and a better feature learning approach that sees better the variances and characteristics of the input. Nevertheless, some drawbacks exist, such as the need for more data to show the benefits of ViT over CNN, the increased computational cost due to the complexity of the self-attention block, a more challenging training process, and the lack of interpretability. These drawbacks represent the future research direction that should be targeted to increase the efficiency of ViT in the image restoration domain. MDPI 2023-02-21 /pmc/articles/PMC10006889/ /pubmed/36904589 http://dx.doi.org/10.3390/s23052385 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Ali, Anas M. Benjdira, Bilel Koubaa, Anis El-Shafai, Walid Khan, Zahid Boulila, Wadii Vision Transformers in Image Restoration: A Survey |
title | Vision Transformers in Image Restoration: A Survey |
title_full | Vision Transformers in Image Restoration: A Survey |
title_fullStr | Vision Transformers in Image Restoration: A Survey |
title_full_unstemmed | Vision Transformers in Image Restoration: A Survey |
title_short | Vision Transformers in Image Restoration: A Survey |
title_sort | vision transformers in image restoration: a survey |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10006889/ https://www.ncbi.nlm.nih.gov/pubmed/36904589 http://dx.doi.org/10.3390/s23052385 |
work_keys_str_mv | AT alianasm visiontransformersinimagerestorationasurvey AT benjdirabilel visiontransformersinimagerestorationasurvey AT koubaaanis visiontransformersinimagerestorationasurvey AT elshafaiwalid visiontransformersinimagerestorationasurvey AT khanzahid visiontransformersinimagerestorationasurvey AT boulilawadii visiontransformersinimagerestorationasurvey |