Cargando…

Object Detection Based on Swin Deformable Transformer-BiPAFPN-YOLOX

Object detection technology plays a crucial role in people's everyday lives, as well as enterprise production and modern national defense. Most current object detection networks, such as YOLOX, employ convolutional neural networks instead of a Transformer as a backbone. However, these technique...

Descripción completa

Detalles Bibliográficos
Autores principales:	Shi, Peicheng, Chen, Xinhe, Qi, Heng, Zhang, Chenghui, Liu, Zhiqiang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi 2023
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10019960/ https://www.ncbi.nlm.nih.gov/pubmed/36936669 http://dx.doi.org/10.1155/2023/4228610

_version_	1784908144199073792
author	Shi, Peicheng Chen, Xinhe Qi, Heng Zhang, Chenghui Liu, Zhiqiang
author_facet	Shi, Peicheng Chen, Xinhe Qi, Heng Zhang, Chenghui Liu, Zhiqiang
author_sort	Shi, Peicheng
collection	PubMed
description	Object detection technology plays a crucial role in people's everyday lives, as well as enterprise production and modern national defense. Most current object detection networks, such as YOLOX, employ convolutional neural networks instead of a Transformer as a backbone. However, these techniques lack a global understanding of the images and may lose meaningful information, such as the precise location of the most active feature detector. Recently, a Transformer with larger receptive fields showed superior performance to corresponding convolutional neural networks in computer vision tasks. The Transformer splits the image into patches and subsequently feeds them to the Transformer in a sequence structure similar to word embeddings. This makes it capable of global modeling of entire images and implies global understanding of images. However, simply using a Transformer with a larger receptive field raises several concerns. For example, self-attention in the Swin Transformer backbone will limit its ability to model long range relations, resulting in poor feature extraction results and low convergence speed during training. To address the above problems, first, we propose an important region-based Reconstructed Deformable Self-Attention that shifts attention to important regions for efficient global modeling. Second, based on the Reconstructed Deformable Self-Attention, we propose the Swin Deformable Transformer backbone, which improves the feature extraction ability and convergence speed. Finally, based on the Swin Deformable Transformer backbone, we propose a novel object detection network, namely, Swin Deformable Transformer-BiPAFPN-YOLOX. experimental results on the COCO dataset show that the training period is reduced by 55.4%, average precision is increased by 2.4%, average precision of small objects is increased by 3.7%, and inference speed is increased by 35%.
format	Online Article Text
id	pubmed-10019960
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Hindawi
record_format	MEDLINE/PubMed
spelling	pubmed-100199602023-03-17 Object Detection Based on Swin Deformable Transformer-BiPAFPN-YOLOX Shi, Peicheng Chen, Xinhe Qi, Heng Zhang, Chenghui Liu, Zhiqiang Comput Intell Neurosci Research Article Object detection technology plays a crucial role in people's everyday lives, as well as enterprise production and modern national defense. Most current object detection networks, such as YOLOX, employ convolutional neural networks instead of a Transformer as a backbone. However, these techniques lack a global understanding of the images and may lose meaningful information, such as the precise location of the most active feature detector. Recently, a Transformer with larger receptive fields showed superior performance to corresponding convolutional neural networks in computer vision tasks. The Transformer splits the image into patches and subsequently feeds them to the Transformer in a sequence structure similar to word embeddings. This makes it capable of global modeling of entire images and implies global understanding of images. However, simply using a Transformer with a larger receptive field raises several concerns. For example, self-attention in the Swin Transformer backbone will limit its ability to model long range relations, resulting in poor feature extraction results and low convergence speed during training. To address the above problems, first, we propose an important region-based Reconstructed Deformable Self-Attention that shifts attention to important regions for efficient global modeling. Second, based on the Reconstructed Deformable Self-Attention, we propose the Swin Deformable Transformer backbone, which improves the feature extraction ability and convergence speed. Finally, based on the Swin Deformable Transformer backbone, we propose a novel object detection network, namely, Swin Deformable Transformer-BiPAFPN-YOLOX. experimental results on the COCO dataset show that the training period is reduced by 55.4%, average precision is increased by 2.4%, average precision of small objects is increased by 3.7%, and inference speed is increased by 35%. Hindawi 2023-03-09 /pmc/articles/PMC10019960/ /pubmed/36936669 http://dx.doi.org/10.1155/2023/4228610 Text en Copyright © 2023 Peicheng Shi et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Shi, Peicheng Chen, Xinhe Qi, Heng Zhang, Chenghui Liu, Zhiqiang Object Detection Based on Swin Deformable Transformer-BiPAFPN-YOLOX
title	Object Detection Based on Swin Deformable Transformer-BiPAFPN-YOLOX
title_full	Object Detection Based on Swin Deformable Transformer-BiPAFPN-YOLOX
title_fullStr	Object Detection Based on Swin Deformable Transformer-BiPAFPN-YOLOX
title_full_unstemmed	Object Detection Based on Swin Deformable Transformer-BiPAFPN-YOLOX
title_short	Object Detection Based on Swin Deformable Transformer-BiPAFPN-YOLOX
title_sort	object detection based on swin deformable transformer-bipafpn-yolox
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10019960/ https://www.ncbi.nlm.nih.gov/pubmed/36936669 http://dx.doi.org/10.1155/2023/4228610
work_keys_str_mv	AT shipeicheng objectdetectionbasedonswindeformabletransformerbipafpnyolox AT chenxinhe objectdetectionbasedonswindeformabletransformerbipafpnyolox AT qiheng objectdetectionbasedonswindeformabletransformerbipafpnyolox AT zhangchenghui objectdetectionbasedonswindeformabletransformerbipafpnyolox AT liuzhiqiang objectdetectionbasedonswindeformabletransformerbipafpnyolox

Object Detection Based on Swin Deformable Transformer-BiPAFPN-YOLOX

Ejemplares similares