Cargando…

MMNet: A Mixing Module Network for Polyp Segmentation

Traditional encoder–decoder networks like U-Net have been extensively used for polyp segmentation. However, such networks have demonstrated limitations in explicitly modeling long-range dependencies. In such networks, local patterns are emphasized over the global context, as each convolutional kerne...

Descripción completa

Detalles Bibliográficos
Autores principales: Ghimire, Raman, Lee, Sang-Woong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10458640/
https://www.ncbi.nlm.nih.gov/pubmed/37631792
http://dx.doi.org/10.3390/s23167258
_version_ 1785097214397251584
author Ghimire, Raman
Lee, Sang-Woong
author_facet Ghimire, Raman
Lee, Sang-Woong
author_sort Ghimire, Raman
collection PubMed
description Traditional encoder–decoder networks like U-Net have been extensively used for polyp segmentation. However, such networks have demonstrated limitations in explicitly modeling long-range dependencies. In such networks, local patterns are emphasized over the global context, as each convolutional kernel focuses on only a local subset of pixels in the entire image. Several recent transformer-based networks have been shown to overcome such limitations. Such networks encode long-range dependencies using self-attention methods and thus learn highly expressive representations. However, due to the computational complexity of modeling the whole image, self-attention is expensive to compute, as there is a quadratic increment in cost with the increase in pixels in the image. Thus, patch embedding has been utilized, which groups small regions of the image into single input features. Nevertheless, these transformers still lack inductive bias, even with the image as a 1D sequence of visual tokens. This results in the inability to generalize to local contexts due to limited low-level features. We introduce a hybrid transformer combined with a convolutional mixing network to overcome computational and long-range dependency issues. A pretrained transformer network is introduced as a feature-extracting encoder, and a mixing module network (MMNet) is introduced to capture the long-range dependencies with a reduced computational cost. Precisely, in the mixing module network, we use depth-wise and 1 × 1 convolution to model long-range dependencies to establish spatial and cross-channel correlation, respectively. The proposed approach is evaluated qualitatively and quantitatively on five challenging polyp datasets across six metrics. Our MMNet outperforms the previous best polyp segmentation methods.
format Online
Article
Text
id pubmed-10458640
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-104586402023-08-27 MMNet: A Mixing Module Network for Polyp Segmentation Ghimire, Raman Lee, Sang-Woong Sensors (Basel) Article Traditional encoder–decoder networks like U-Net have been extensively used for polyp segmentation. However, such networks have demonstrated limitations in explicitly modeling long-range dependencies. In such networks, local patterns are emphasized over the global context, as each convolutional kernel focuses on only a local subset of pixels in the entire image. Several recent transformer-based networks have been shown to overcome such limitations. Such networks encode long-range dependencies using self-attention methods and thus learn highly expressive representations. However, due to the computational complexity of modeling the whole image, self-attention is expensive to compute, as there is a quadratic increment in cost with the increase in pixels in the image. Thus, patch embedding has been utilized, which groups small regions of the image into single input features. Nevertheless, these transformers still lack inductive bias, even with the image as a 1D sequence of visual tokens. This results in the inability to generalize to local contexts due to limited low-level features. We introduce a hybrid transformer combined with a convolutional mixing network to overcome computational and long-range dependency issues. A pretrained transformer network is introduced as a feature-extracting encoder, and a mixing module network (MMNet) is introduced to capture the long-range dependencies with a reduced computational cost. Precisely, in the mixing module network, we use depth-wise and 1 × 1 convolution to model long-range dependencies to establish spatial and cross-channel correlation, respectively. The proposed approach is evaluated qualitatively and quantitatively on five challenging polyp datasets across six metrics. Our MMNet outperforms the previous best polyp segmentation methods. MDPI 2023-08-18 /pmc/articles/PMC10458640/ /pubmed/37631792 http://dx.doi.org/10.3390/s23167258 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Ghimire, Raman
Lee, Sang-Woong
MMNet: A Mixing Module Network for Polyp Segmentation
title MMNet: A Mixing Module Network for Polyp Segmentation
title_full MMNet: A Mixing Module Network for Polyp Segmentation
title_fullStr MMNet: A Mixing Module Network for Polyp Segmentation
title_full_unstemmed MMNet: A Mixing Module Network for Polyp Segmentation
title_short MMNet: A Mixing Module Network for Polyp Segmentation
title_sort mmnet: a mixing module network for polyp segmentation
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10458640/
https://www.ncbi.nlm.nih.gov/pubmed/37631792
http://dx.doi.org/10.3390/s23167258
work_keys_str_mv AT ghimireraman mmnetamixingmodulenetworkforpolypsegmentation
AT leesangwoong mmnetamixingmodulenetworkforpolypsegmentation