Cargando…

Conv-Former: A Novel Network Combining Convolution and Self-Attention for Image Quality Assessment

To address the challenge of no-reference image quality assessment (NR-IQA) for authentically and synthetically distorted images, we propose a novel network called the Combining Convolution and Self-Attention for Image Quality Assessment network (Conv-Former). Our model uses a multi-stage transformer...

Descripción completa

Detalles Bibliográficos
Autores principales: Han, Lintao, Lv, Hengyi, Zhao, Yuchen, Liu, Hailong, Bi, Guoling, Yin, Zhiyong, Fang, Yuqiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9824537/
https://www.ncbi.nlm.nih.gov/pubmed/36617024
http://dx.doi.org/10.3390/s23010427
_version_ 1784866434475622400
author Han, Lintao
Lv, Hengyi
Zhao, Yuchen
Liu, Hailong
Bi, Guoling
Yin, Zhiyong
Fang, Yuqiang
author_facet Han, Lintao
Lv, Hengyi
Zhao, Yuchen
Liu, Hailong
Bi, Guoling
Yin, Zhiyong
Fang, Yuqiang
author_sort Han, Lintao
collection PubMed
description To address the challenge of no-reference image quality assessment (NR-IQA) for authentically and synthetically distorted images, we propose a novel network called the Combining Convolution and Self-Attention for Image Quality Assessment network (Conv-Former). Our model uses a multi-stage transformer architecture similar to that of ResNet-50 to represent appropriate perceptual mechanisms in image quality assessment (IQA) to build an accurate IQA model. We employ adaptive learnable position embedding to handle images with arbitrary resolution. We propose a new transformer block (TB) by taking advantage of transformers to capture long-range dependencies, and of local information perception (LIP) to model local features for enhanced representation learning. The module increases the model’s understanding of the image content. Dual path pooling (DPP) is used to keep more contextual image quality information in feature downsampling. Experimental results verify that Conv-Former not only outperforms the state-of-the-art methods on authentic image databases, but also achieves competing performances on synthetic image databases which demonstrate the strong fitting performance and generalization capability of our proposed model.
format Online
Article
Text
id pubmed-9824537
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-98245372023-01-08 Conv-Former: A Novel Network Combining Convolution and Self-Attention for Image Quality Assessment Han, Lintao Lv, Hengyi Zhao, Yuchen Liu, Hailong Bi, Guoling Yin, Zhiyong Fang, Yuqiang Sensors (Basel) Article To address the challenge of no-reference image quality assessment (NR-IQA) for authentically and synthetically distorted images, we propose a novel network called the Combining Convolution and Self-Attention for Image Quality Assessment network (Conv-Former). Our model uses a multi-stage transformer architecture similar to that of ResNet-50 to represent appropriate perceptual mechanisms in image quality assessment (IQA) to build an accurate IQA model. We employ adaptive learnable position embedding to handle images with arbitrary resolution. We propose a new transformer block (TB) by taking advantage of transformers to capture long-range dependencies, and of local information perception (LIP) to model local features for enhanced representation learning. The module increases the model’s understanding of the image content. Dual path pooling (DPP) is used to keep more contextual image quality information in feature downsampling. Experimental results verify that Conv-Former not only outperforms the state-of-the-art methods on authentic image databases, but also achieves competing performances on synthetic image databases which demonstrate the strong fitting performance and generalization capability of our proposed model. MDPI 2022-12-30 /pmc/articles/PMC9824537/ /pubmed/36617024 http://dx.doi.org/10.3390/s23010427 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Han, Lintao
Lv, Hengyi
Zhao, Yuchen
Liu, Hailong
Bi, Guoling
Yin, Zhiyong
Fang, Yuqiang
Conv-Former: A Novel Network Combining Convolution and Self-Attention for Image Quality Assessment
title Conv-Former: A Novel Network Combining Convolution and Self-Attention for Image Quality Assessment
title_full Conv-Former: A Novel Network Combining Convolution and Self-Attention for Image Quality Assessment
title_fullStr Conv-Former: A Novel Network Combining Convolution and Self-Attention for Image Quality Assessment
title_full_unstemmed Conv-Former: A Novel Network Combining Convolution and Self-Attention for Image Quality Assessment
title_short Conv-Former: A Novel Network Combining Convolution and Self-Attention for Image Quality Assessment
title_sort conv-former: a novel network combining convolution and self-attention for image quality assessment
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9824537/
https://www.ncbi.nlm.nih.gov/pubmed/36617024
http://dx.doi.org/10.3390/s23010427
work_keys_str_mv AT hanlintao convformeranovelnetworkcombiningconvolutionandselfattentionforimagequalityassessment
AT lvhengyi convformeranovelnetworkcombiningconvolutionandselfattentionforimagequalityassessment
AT zhaoyuchen convformeranovelnetworkcombiningconvolutionandselfattentionforimagequalityassessment
AT liuhailong convformeranovelnetworkcombiningconvolutionandselfattentionforimagequalityassessment
AT biguoling convformeranovelnetworkcombiningconvolutionandselfattentionforimagequalityassessment
AT yinzhiyong convformeranovelnetworkcombiningconvolutionandselfattentionforimagequalityassessment
AT fangyuqiang convformeranovelnetworkcombiningconvolutionandselfattentionforimagequalityassessment