Cargando…
MMNet: A Mixing Module Network for Polyp Segmentation
Traditional encoder–decoder networks like U-Net have been extensively used for polyp segmentation. However, such networks have demonstrated limitations in explicitly modeling long-range dependencies. In such networks, local patterns are emphasized over the global context, as each convolutional kerne...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10458640/ https://www.ncbi.nlm.nih.gov/pubmed/37631792 http://dx.doi.org/10.3390/s23167258 |
_version_ | 1785097214397251584 |
---|---|
author | Ghimire, Raman Lee, Sang-Woong |
author_facet | Ghimire, Raman Lee, Sang-Woong |
author_sort | Ghimire, Raman |
collection | PubMed |
description | Traditional encoder–decoder networks like U-Net have been extensively used for polyp segmentation. However, such networks have demonstrated limitations in explicitly modeling long-range dependencies. In such networks, local patterns are emphasized over the global context, as each convolutional kernel focuses on only a local subset of pixels in the entire image. Several recent transformer-based networks have been shown to overcome such limitations. Such networks encode long-range dependencies using self-attention methods and thus learn highly expressive representations. However, due to the computational complexity of modeling the whole image, self-attention is expensive to compute, as there is a quadratic increment in cost with the increase in pixels in the image. Thus, patch embedding has been utilized, which groups small regions of the image into single input features. Nevertheless, these transformers still lack inductive bias, even with the image as a 1D sequence of visual tokens. This results in the inability to generalize to local contexts due to limited low-level features. We introduce a hybrid transformer combined with a convolutional mixing network to overcome computational and long-range dependency issues. A pretrained transformer network is introduced as a feature-extracting encoder, and a mixing module network (MMNet) is introduced to capture the long-range dependencies with a reduced computational cost. Precisely, in the mixing module network, we use depth-wise and 1 × 1 convolution to model long-range dependencies to establish spatial and cross-channel correlation, respectively. The proposed approach is evaluated qualitatively and quantitatively on five challenging polyp datasets across six metrics. Our MMNet outperforms the previous best polyp segmentation methods. |
format | Online Article Text |
id | pubmed-10458640 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-104586402023-08-27 MMNet: A Mixing Module Network for Polyp Segmentation Ghimire, Raman Lee, Sang-Woong Sensors (Basel) Article Traditional encoder–decoder networks like U-Net have been extensively used for polyp segmentation. However, such networks have demonstrated limitations in explicitly modeling long-range dependencies. In such networks, local patterns are emphasized over the global context, as each convolutional kernel focuses on only a local subset of pixels in the entire image. Several recent transformer-based networks have been shown to overcome such limitations. Such networks encode long-range dependencies using self-attention methods and thus learn highly expressive representations. However, due to the computational complexity of modeling the whole image, self-attention is expensive to compute, as there is a quadratic increment in cost with the increase in pixels in the image. Thus, patch embedding has been utilized, which groups small regions of the image into single input features. Nevertheless, these transformers still lack inductive bias, even with the image as a 1D sequence of visual tokens. This results in the inability to generalize to local contexts due to limited low-level features. We introduce a hybrid transformer combined with a convolutional mixing network to overcome computational and long-range dependency issues. A pretrained transformer network is introduced as a feature-extracting encoder, and a mixing module network (MMNet) is introduced to capture the long-range dependencies with a reduced computational cost. Precisely, in the mixing module network, we use depth-wise and 1 × 1 convolution to model long-range dependencies to establish spatial and cross-channel correlation, respectively. The proposed approach is evaluated qualitatively and quantitatively on five challenging polyp datasets across six metrics. Our MMNet outperforms the previous best polyp segmentation methods. MDPI 2023-08-18 /pmc/articles/PMC10458640/ /pubmed/37631792 http://dx.doi.org/10.3390/s23167258 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Ghimire, Raman Lee, Sang-Woong MMNet: A Mixing Module Network for Polyp Segmentation |
title | MMNet: A Mixing Module Network for Polyp Segmentation |
title_full | MMNet: A Mixing Module Network for Polyp Segmentation |
title_fullStr | MMNet: A Mixing Module Network for Polyp Segmentation |
title_full_unstemmed | MMNet: A Mixing Module Network for Polyp Segmentation |
title_short | MMNet: A Mixing Module Network for Polyp Segmentation |
title_sort | mmnet: a mixing module network for polyp segmentation |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10458640/ https://www.ncbi.nlm.nih.gov/pubmed/37631792 http://dx.doi.org/10.3390/s23167258 |
work_keys_str_mv | AT ghimireraman mmnetamixingmodulenetworkforpolypsegmentation AT leesangwoong mmnetamixingmodulenetworkforpolypsegmentation |