Cargando…

Focal DETR: Target-Aware Token Design for Transformer-Based Object Detection

In this paper, we propose a novel target-aware token design for transformer-based object detection. To tackle the target attribute diffusion challenge of transformer-based object detection, we propose two key components in the new target-aware token design mechanism. Firstly, we propose a target-awa...

Descripción completa

Detalles Bibliográficos
Autores principales:	Xie, Tianming, Zhang, Zhonghao, Tian, Jing, Ma, Lihong
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9699219/ https://www.ncbi.nlm.nih.gov/pubmed/36433282 http://dx.doi.org/10.3390/s22228686

_version_	1784839017934618624
author	Xie, Tianming Zhang, Zhonghao Tian, Jing Ma, Lihong
author_facet	Xie, Tianming Zhang, Zhonghao Tian, Jing Ma, Lihong
author_sort	Xie, Tianming
collection	PubMed
description	In this paper, we propose a novel target-aware token design for transformer-based object detection. To tackle the target attribute diffusion challenge of transformer-based object detection, we propose two key components in the new target-aware token design mechanism. Firstly, we propose a target-aware sampling module, which forces the sampling patterns to converge inside the target region and obtain its representative encoded features. More specifically, a set of four sampling patterns are designed, including small and large patterns, which focus on the detailed and overall characteristics of a target, respectively, as well as the vertical and horizontal patterns, which handle the object’s directional structures. Secondly, we propose a target-aware key-value matrix. This is a unified, learnable, feature-embedding matrix which is directly weighted on the feature map to reduce the interference of non-target regions. With such a new design, we propose a new variant of the transformer-based object-detection model, called Focal DETR, which achieves superior performance over the state-of-the-art transformer-based object-detection models on the COCO object-detection benchmark dataset. Experimental results demonstrate that our Focal DETR achieves a 44.7 AP in the coco2017 test set, which is 2.7 AP and 0.9 AP higher than the DETR and deformable DETR using the same training strategy and the same feature-extraction network.
format	Online Article Text
id	pubmed-9699219
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-96992192022-11-26 Focal DETR: Target-Aware Token Design for Transformer-Based Object Detection Xie, Tianming Zhang, Zhonghao Tian, Jing Ma, Lihong Sensors (Basel) Article In this paper, we propose a novel target-aware token design for transformer-based object detection. To tackle the target attribute diffusion challenge of transformer-based object detection, we propose two key components in the new target-aware token design mechanism. Firstly, we propose a target-aware sampling module, which forces the sampling patterns to converge inside the target region and obtain its representative encoded features. More specifically, a set of four sampling patterns are designed, including small and large patterns, which focus on the detailed and overall characteristics of a target, respectively, as well as the vertical and horizontal patterns, which handle the object’s directional structures. Secondly, we propose a target-aware key-value matrix. This is a unified, learnable, feature-embedding matrix which is directly weighted on the feature map to reduce the interference of non-target regions. With such a new design, we propose a new variant of the transformer-based object-detection model, called Focal DETR, which achieves superior performance over the state-of-the-art transformer-based object-detection models on the COCO object-detection benchmark dataset. Experimental results demonstrate that our Focal DETR achieves a 44.7 AP in the coco2017 test set, which is 2.7 AP and 0.9 AP higher than the DETR and deformable DETR using the same training strategy and the same feature-extraction network. MDPI 2022-11-10 /pmc/articles/PMC9699219/ /pubmed/36433282 http://dx.doi.org/10.3390/s22228686 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Xie, Tianming Zhang, Zhonghao Tian, Jing Ma, Lihong Focal DETR: Target-Aware Token Design for Transformer-Based Object Detection
title	Focal DETR: Target-Aware Token Design for Transformer-Based Object Detection
title_full	Focal DETR: Target-Aware Token Design for Transformer-Based Object Detection
title_fullStr	Focal DETR: Target-Aware Token Design for Transformer-Based Object Detection
title_full_unstemmed	Focal DETR: Target-Aware Token Design for Transformer-Based Object Detection
title_short	Focal DETR: Target-Aware Token Design for Transformer-Based Object Detection
title_sort	focal detr: target-aware token design for transformer-based object detection
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9699219/ https://www.ncbi.nlm.nih.gov/pubmed/36433282 http://dx.doi.org/10.3390/s22228686
work_keys_str_mv	AT xietianming focaldetrtargetawaretokendesignfortransformerbasedobjectdetection AT zhangzhonghao focaldetrtargetawaretokendesignfortransformerbasedobjectdetection AT tianjing focaldetrtargetawaretokendesignfortransformerbasedobjectdetection AT malihong focaldetrtargetawaretokendesignfortransformerbasedobjectdetection

Focal DETR: Target-Aware Token Design for Transformer-Based Object Detection

Ejemplares similares