Cargando…
Conv-Former: A Novel Network Combining Convolution and Self-Attention for Image Quality Assessment
To address the challenge of no-reference image quality assessment (NR-IQA) for authentically and synthetically distorted images, we propose a novel network called the Combining Convolution and Self-Attention for Image Quality Assessment network (Conv-Former). Our model uses a multi-stage transformer...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9824537/ https://www.ncbi.nlm.nih.gov/pubmed/36617024 http://dx.doi.org/10.3390/s23010427 |
_version_ | 1784866434475622400 |
---|---|
author | Han, Lintao Lv, Hengyi Zhao, Yuchen Liu, Hailong Bi, Guoling Yin, Zhiyong Fang, Yuqiang |
author_facet | Han, Lintao Lv, Hengyi Zhao, Yuchen Liu, Hailong Bi, Guoling Yin, Zhiyong Fang, Yuqiang |
author_sort | Han, Lintao |
collection | PubMed |
description | To address the challenge of no-reference image quality assessment (NR-IQA) for authentically and synthetically distorted images, we propose a novel network called the Combining Convolution and Self-Attention for Image Quality Assessment network (Conv-Former). Our model uses a multi-stage transformer architecture similar to that of ResNet-50 to represent appropriate perceptual mechanisms in image quality assessment (IQA) to build an accurate IQA model. We employ adaptive learnable position embedding to handle images with arbitrary resolution. We propose a new transformer block (TB) by taking advantage of transformers to capture long-range dependencies, and of local information perception (LIP) to model local features for enhanced representation learning. The module increases the model’s understanding of the image content. Dual path pooling (DPP) is used to keep more contextual image quality information in feature downsampling. Experimental results verify that Conv-Former not only outperforms the state-of-the-art methods on authentic image databases, but also achieves competing performances on synthetic image databases which demonstrate the strong fitting performance and generalization capability of our proposed model. |
format | Online Article Text |
id | pubmed-9824537 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-98245372023-01-08 Conv-Former: A Novel Network Combining Convolution and Self-Attention for Image Quality Assessment Han, Lintao Lv, Hengyi Zhao, Yuchen Liu, Hailong Bi, Guoling Yin, Zhiyong Fang, Yuqiang Sensors (Basel) Article To address the challenge of no-reference image quality assessment (NR-IQA) for authentically and synthetically distorted images, we propose a novel network called the Combining Convolution and Self-Attention for Image Quality Assessment network (Conv-Former). Our model uses a multi-stage transformer architecture similar to that of ResNet-50 to represent appropriate perceptual mechanisms in image quality assessment (IQA) to build an accurate IQA model. We employ adaptive learnable position embedding to handle images with arbitrary resolution. We propose a new transformer block (TB) by taking advantage of transformers to capture long-range dependencies, and of local information perception (LIP) to model local features for enhanced representation learning. The module increases the model’s understanding of the image content. Dual path pooling (DPP) is used to keep more contextual image quality information in feature downsampling. Experimental results verify that Conv-Former not only outperforms the state-of-the-art methods on authentic image databases, but also achieves competing performances on synthetic image databases which demonstrate the strong fitting performance and generalization capability of our proposed model. MDPI 2022-12-30 /pmc/articles/PMC9824537/ /pubmed/36617024 http://dx.doi.org/10.3390/s23010427 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Han, Lintao Lv, Hengyi Zhao, Yuchen Liu, Hailong Bi, Guoling Yin, Zhiyong Fang, Yuqiang Conv-Former: A Novel Network Combining Convolution and Self-Attention for Image Quality Assessment |
title | Conv-Former: A Novel Network Combining Convolution and Self-Attention for Image Quality Assessment |
title_full | Conv-Former: A Novel Network Combining Convolution and Self-Attention for Image Quality Assessment |
title_fullStr | Conv-Former: A Novel Network Combining Convolution and Self-Attention for Image Quality Assessment |
title_full_unstemmed | Conv-Former: A Novel Network Combining Convolution and Self-Attention for Image Quality Assessment |
title_short | Conv-Former: A Novel Network Combining Convolution and Self-Attention for Image Quality Assessment |
title_sort | conv-former: a novel network combining convolution and self-attention for image quality assessment |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9824537/ https://www.ncbi.nlm.nih.gov/pubmed/36617024 http://dx.doi.org/10.3390/s23010427 |
work_keys_str_mv | AT hanlintao convformeranovelnetworkcombiningconvolutionandselfattentionforimagequalityassessment AT lvhengyi convformeranovelnetworkcombiningconvolutionandselfattentionforimagequalityassessment AT zhaoyuchen convformeranovelnetworkcombiningconvolutionandselfattentionforimagequalityassessment AT liuhailong convformeranovelnetworkcombiningconvolutionandselfattentionforimagequalityassessment AT biguoling convformeranovelnetworkcombiningconvolutionandselfattentionforimagequalityassessment AT yinzhiyong convformeranovelnetworkcombiningconvolutionandselfattentionforimagequalityassessment AT fangyuqiang convformeranovelnetworkcombiningconvolutionandselfattentionforimagequalityassessment |