Cargando…

Swin Unet3D: a three-dimensional medical image segmentation network combining vision transformer and convolution

BACKGROUND: Semantic segmentation of brain tumors plays a critical role in clinical treatment, especially for three-dimensional (3D) magnetic resonance imaging, which is often used in clinical practice. Automatic segmentation of the 3D structure of brain tumors can quickly help physicians understand...

Descripción completa

Detalles Bibliográficos
Autores principales:	Cai, Yimin, Long, Yuqing, Han, Zhenggong, Liu, Mingkun, Zheng, Yuchen, Yang, Wei, Chen, Liming
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2023
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9926542/ https://www.ncbi.nlm.nih.gov/pubmed/36788560 http://dx.doi.org/10.1186/s12911-023-02129-z

_version_	1784888298157637632
author	Cai, Yimin Long, Yuqing Han, Zhenggong Liu, Mingkun Zheng, Yuchen Yang, Wei Chen, Liming
author_facet	Cai, Yimin Long, Yuqing Han, Zhenggong Liu, Mingkun Zheng, Yuchen Yang, Wei Chen, Liming
author_sort	Cai, Yimin
collection	PubMed
description	BACKGROUND: Semantic segmentation of brain tumors plays a critical role in clinical treatment, especially for three-dimensional (3D) magnetic resonance imaging, which is often used in clinical practice. Automatic segmentation of the 3D structure of brain tumors can quickly help physicians understand the properties of tumors, such as the shape and size, thus improving the efficiency of preoperative planning and the odds of successful surgery. In past decades, 3D convolutional neural networks (CNNs) have dominated automatic segmentation methods for 3D medical images, and these network structures have achieved good results. However, to reduce the number of neural network parameters, practitioners ensure that the size of convolutional kernels in 3D convolutional operations generally does not exceed [Formula: see text] , which also leads to CNNs showing limitations in learning long-distance dependent information. Vision Transformer (ViT) is very good at learning long-distance dependent information in images, but it suffers from the problems of many parameters. What’s worse, the ViT cannot learn local dependency information in the previous layers under the condition of insufficient data. However, in the image segmentation task, being able to learn this local dependency information in the previous layers makes a big impact on the performance of the model. METHODS: This paper proposes the Swin Unet3D model, which represents voxel segmentation on medical images as a sequence-to-sequence prediction. The feature extraction sub-module in the model is designed as a parallel structure of Convolution and ViT so that all layers of the model are able to adequately learn both global and local dependency information in the image. RESULTS: On the validation dataset of Brats2021, our proposed model achieves dice coefficients of 0.840, 0.874, and 0.911 on the ET channel, TC channel, and WT channel, respectively. On the validation dataset of Brats2018, our model achieves dice coefficients of 0.716, 0.761, and 0.874 on the corresponding channels, respectively. CONCLUSION: We propose a new segmentation model that combines the advantages of Vision Transformer and Convolution and achieves a better balance between the number of model parameters and segmentation accuracy. The code can be found at https://github.com/1152545264/SwinUnet3D.
format	Online Article Text
id	pubmed-9926542
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-99265422023-02-15 Swin Unet3D: a three-dimensional medical image segmentation network combining vision transformer and convolution Cai, Yimin Long, Yuqing Han, Zhenggong Liu, Mingkun Zheng, Yuchen Yang, Wei Chen, Liming BMC Med Inform Decis Mak Research BACKGROUND: Semantic segmentation of brain tumors plays a critical role in clinical treatment, especially for three-dimensional (3D) magnetic resonance imaging, which is often used in clinical practice. Automatic segmentation of the 3D structure of brain tumors can quickly help physicians understand the properties of tumors, such as the shape and size, thus improving the efficiency of preoperative planning and the odds of successful surgery. In past decades, 3D convolutional neural networks (CNNs) have dominated automatic segmentation methods for 3D medical images, and these network structures have achieved good results. However, to reduce the number of neural network parameters, practitioners ensure that the size of convolutional kernels in 3D convolutional operations generally does not exceed [Formula: see text] , which also leads to CNNs showing limitations in learning long-distance dependent information. Vision Transformer (ViT) is very good at learning long-distance dependent information in images, but it suffers from the problems of many parameters. What’s worse, the ViT cannot learn local dependency information in the previous layers under the condition of insufficient data. However, in the image segmentation task, being able to learn this local dependency information in the previous layers makes a big impact on the performance of the model. METHODS: This paper proposes the Swin Unet3D model, which represents voxel segmentation on medical images as a sequence-to-sequence prediction. The feature extraction sub-module in the model is designed as a parallel structure of Convolution and ViT so that all layers of the model are able to adequately learn both global and local dependency information in the image. RESULTS: On the validation dataset of Brats2021, our proposed model achieves dice coefficients of 0.840, 0.874, and 0.911 on the ET channel, TC channel, and WT channel, respectively. On the validation dataset of Brats2018, our model achieves dice coefficients of 0.716, 0.761, and 0.874 on the corresponding channels, respectively. CONCLUSION: We propose a new segmentation model that combines the advantages of Vision Transformer and Convolution and achieves a better balance between the number of model parameters and segmentation accuracy. The code can be found at https://github.com/1152545264/SwinUnet3D. BioMed Central 2023-02-14 /pmc/articles/PMC9926542/ /pubmed/36788560 http://dx.doi.org/10.1186/s12911-023-02129-z Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Cai, Yimin Long, Yuqing Han, Zhenggong Liu, Mingkun Zheng, Yuchen Yang, Wei Chen, Liming Swin Unet3D: a three-dimensional medical image segmentation network combining vision transformer and convolution
title	Swin Unet3D: a three-dimensional medical image segmentation network combining vision transformer and convolution
title_full	Swin Unet3D: a three-dimensional medical image segmentation network combining vision transformer and convolution
title_fullStr	Swin Unet3D: a three-dimensional medical image segmentation network combining vision transformer and convolution
title_full_unstemmed	Swin Unet3D: a three-dimensional medical image segmentation network combining vision transformer and convolution
title_short	Swin Unet3D: a three-dimensional medical image segmentation network combining vision transformer and convolution
title_sort	swin unet3d: a three-dimensional medical image segmentation network combining vision transformer and convolution
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9926542/ https://www.ncbi.nlm.nih.gov/pubmed/36788560 http://dx.doi.org/10.1186/s12911-023-02129-z
work_keys_str_mv	AT caiyimin swinunet3dathreedimensionalmedicalimagesegmentationnetworkcombiningvisiontransformerandconvolution AT longyuqing swinunet3dathreedimensionalmedicalimagesegmentationnetworkcombiningvisiontransformerandconvolution AT hanzhenggong swinunet3dathreedimensionalmedicalimagesegmentationnetworkcombiningvisiontransformerandconvolution AT liumingkun swinunet3dathreedimensionalmedicalimagesegmentationnetworkcombiningvisiontransformerandconvolution AT zhengyuchen swinunet3dathreedimensionalmedicalimagesegmentationnetworkcombiningvisiontransformerandconvolution AT yangwei swinunet3dathreedimensionalmedicalimagesegmentationnetworkcombiningvisiontransformerandconvolution AT chenliming swinunet3dathreedimensionalmedicalimagesegmentationnetworkcombiningvisiontransformerandconvolution

Swin Unet3D: a three-dimensional medical image segmentation network combining vision transformer and convolution

Ejemplares similares