Cargando…

ESAMask: Real-Time Instance Segmentation Fused with Efficient Sparse Attention

Instance segmentation is a challenging task in computer vision, as it requires distinguishing objects and predicting dense areas. Currently, segmentation models based on complex designs and large parameters have achieved remarkable accuracy. However, from a practical standpoint, achieving a balance...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Qian, Chen, Lu, Shao, Mingwen, Liang, Hong, Ren, Jie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10385500/
https://www.ncbi.nlm.nih.gov/pubmed/37514740
http://dx.doi.org/10.3390/s23146446
_version_ 1785081423103787008
author Zhang, Qian
Chen, Lu
Shao, Mingwen
Liang, Hong
Ren, Jie
author_facet Zhang, Qian
Chen, Lu
Shao, Mingwen
Liang, Hong
Ren, Jie
author_sort Zhang, Qian
collection PubMed
description Instance segmentation is a challenging task in computer vision, as it requires distinguishing objects and predicting dense areas. Currently, segmentation models based on complex designs and large parameters have achieved remarkable accuracy. However, from a practical standpoint, achieving a balance between accuracy and speed is even more desirable. To address this need, this paper presents ESAMask, a real-time segmentation model fused with efficient sparse attention, which adheres to the principles of lightweight design and efficiency. In this work, we propose several key contributions. Firstly, we introduce a dynamic and sparse Related Semantic Perceived Attention mechanism (RSPA) for adaptive perception of different semantic information of various targets during feature extraction. RSPA uses the adjacency matrix to search for regions with high semantic correlation of the same target, which reduces computational cost. Additionally, we design the GSInvSAM structure to reduce redundant calculations of spliced features while enhancing interaction between channels when merging feature layers of different scales. Lastly, we introduce the Mixed Receptive Field Context Perception Module (MRFCPM) in the prototype branch to enable targets of different scales to capture the feature representation of the corresponding area during mask generation. MRFCPM fuses information from three branches of global content awareness, large kernel region awareness, and convolutional channel attention to explicitly model features at different scales. Through extensive experimental evaluation, ESAMask achieves a mask AP of 45.4 at a frame rate of 45.2 FPS on the COCO dataset, surpassing current instance segmentation methods in terms of the accuracy–speed trade-off, as demonstrated by our comprehensive experimental results. In addition, the high-quality segmentation results of our proposed method for objects of various classes and scales can be intuitively observed from the visualized segmentation outputs.
format Online
Article
Text
id pubmed-10385500
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-103855002023-07-30 ESAMask: Real-Time Instance Segmentation Fused with Efficient Sparse Attention Zhang, Qian Chen, Lu Shao, Mingwen Liang, Hong Ren, Jie Sensors (Basel) Article Instance segmentation is a challenging task in computer vision, as it requires distinguishing objects and predicting dense areas. Currently, segmentation models based on complex designs and large parameters have achieved remarkable accuracy. However, from a practical standpoint, achieving a balance between accuracy and speed is even more desirable. To address this need, this paper presents ESAMask, a real-time segmentation model fused with efficient sparse attention, which adheres to the principles of lightweight design and efficiency. In this work, we propose several key contributions. Firstly, we introduce a dynamic and sparse Related Semantic Perceived Attention mechanism (RSPA) for adaptive perception of different semantic information of various targets during feature extraction. RSPA uses the adjacency matrix to search for regions with high semantic correlation of the same target, which reduces computational cost. Additionally, we design the GSInvSAM structure to reduce redundant calculations of spliced features while enhancing interaction between channels when merging feature layers of different scales. Lastly, we introduce the Mixed Receptive Field Context Perception Module (MRFCPM) in the prototype branch to enable targets of different scales to capture the feature representation of the corresponding area during mask generation. MRFCPM fuses information from three branches of global content awareness, large kernel region awareness, and convolutional channel attention to explicitly model features at different scales. Through extensive experimental evaluation, ESAMask achieves a mask AP of 45.4 at a frame rate of 45.2 FPS on the COCO dataset, surpassing current instance segmentation methods in terms of the accuracy–speed trade-off, as demonstrated by our comprehensive experimental results. In addition, the high-quality segmentation results of our proposed method for objects of various classes and scales can be intuitively observed from the visualized segmentation outputs. MDPI 2023-07-16 /pmc/articles/PMC10385500/ /pubmed/37514740 http://dx.doi.org/10.3390/s23146446 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Zhang, Qian
Chen, Lu
Shao, Mingwen
Liang, Hong
Ren, Jie
ESAMask: Real-Time Instance Segmentation Fused with Efficient Sparse Attention
title ESAMask: Real-Time Instance Segmentation Fused with Efficient Sparse Attention
title_full ESAMask: Real-Time Instance Segmentation Fused with Efficient Sparse Attention
title_fullStr ESAMask: Real-Time Instance Segmentation Fused with Efficient Sparse Attention
title_full_unstemmed ESAMask: Real-Time Instance Segmentation Fused with Efficient Sparse Attention
title_short ESAMask: Real-Time Instance Segmentation Fused with Efficient Sparse Attention
title_sort esamask: real-time instance segmentation fused with efficient sparse attention
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10385500/
https://www.ncbi.nlm.nih.gov/pubmed/37514740
http://dx.doi.org/10.3390/s23146446
work_keys_str_mv AT zhangqian esamaskrealtimeinstancesegmentationfusedwithefficientsparseattention
AT chenlu esamaskrealtimeinstancesegmentationfusedwithefficientsparseattention
AT shaomingwen esamaskrealtimeinstancesegmentationfusedwithefficientsparseattention
AT lianghong esamaskrealtimeinstancesegmentationfusedwithefficientsparseattention
AT renjie esamaskrealtimeinstancesegmentationfusedwithefficientsparseattention