Cargando…

Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic Segmentation

Transformer-based semantic segmentation methods have achieved excellent performance in recent years. Mask2Former is one of the well-known transformer-based methods which unifies common image segmentation into a universal model. However, it performs relatively poorly in obtaining local features and s...

Descripción completa

Detalles Bibliográficos
Autores principales: Xia, Zhengyu, Kim, Joohee
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9867439/
https://www.ncbi.nlm.nih.gov/pubmed/36679377
http://dx.doi.org/10.3390/s23020581
_version_ 1784876342431449088
author Xia, Zhengyu
Kim, Joohee
author_facet Xia, Zhengyu
Kim, Joohee
author_sort Xia, Zhengyu
collection PubMed
description Transformer-based semantic segmentation methods have achieved excellent performance in recent years. Mask2Former is one of the well-known transformer-based methods which unifies common image segmentation into a universal model. However, it performs relatively poorly in obtaining local features and segmenting small objects due to relying heavily on transformers. To this end, we propose a simple yet effective architecture that introduces auxiliary branches to Mask2Former during training to capture dense local features on the encoder side. The obtained features help improve the performance of learning local information and segmenting small objects. Since the proposed auxiliary convolution layers are required only for training and can be removed during inference, the performance gain can be obtained without additional computation at inference. Experimental results show that our model can achieve state-of-the-art performance (57.6% mIoU) on the ADE20K and (84.8% mIoU) on the Cityscapes datasets.
format Online
Article
Text
id pubmed-9867439
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-98674392023-01-22 Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic Segmentation Xia, Zhengyu Kim, Joohee Sensors (Basel) Article Transformer-based semantic segmentation methods have achieved excellent performance in recent years. Mask2Former is one of the well-known transformer-based methods which unifies common image segmentation into a universal model. However, it performs relatively poorly in obtaining local features and segmenting small objects due to relying heavily on transformers. To this end, we propose a simple yet effective architecture that introduces auxiliary branches to Mask2Former during training to capture dense local features on the encoder side. The obtained features help improve the performance of learning local information and segmenting small objects. Since the proposed auxiliary convolution layers are required only for training and can be removed during inference, the performance gain can be obtained without additional computation at inference. Experimental results show that our model can achieve state-of-the-art performance (57.6% mIoU) on the ADE20K and (84.8% mIoU) on the Cityscapes datasets. MDPI 2023-01-04 /pmc/articles/PMC9867439/ /pubmed/36679377 http://dx.doi.org/10.3390/s23020581 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Xia, Zhengyu
Kim, Joohee
Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic Segmentation
title Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic Segmentation
title_full Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic Segmentation
title_fullStr Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic Segmentation
title_full_unstemmed Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic Segmentation
title_short Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic Segmentation
title_sort enhancing mask transformer with auxiliary convolution layers for semantic segmentation
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9867439/
https://www.ncbi.nlm.nih.gov/pubmed/36679377
http://dx.doi.org/10.3390/s23020581
work_keys_str_mv AT xiazhengyu enhancingmasktransformerwithauxiliaryconvolutionlayersforsemanticsegmentation
AT kimjoohee enhancingmasktransformerwithauxiliaryconvolutionlayersforsemanticsegmentation