Cargando…

Focal DETR: Target-Aware Token Design for Transformer-Based Object Detection

In this paper, we propose a novel target-aware token design for transformer-based object detection. To tackle the target attribute diffusion challenge of transformer-based object detection, we propose two key components in the new target-aware token design mechanism. Firstly, we propose a target-awa...

Descripción completa

Detalles Bibliográficos
Autores principales: Xie, Tianming, Zhang, Zhonghao, Tian, Jing, Ma, Lihong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9699219/
https://www.ncbi.nlm.nih.gov/pubmed/36433282
http://dx.doi.org/10.3390/s22228686
_version_ 1784839017934618624
author Xie, Tianming
Zhang, Zhonghao
Tian, Jing
Ma, Lihong
author_facet Xie, Tianming
Zhang, Zhonghao
Tian, Jing
Ma, Lihong
author_sort Xie, Tianming
collection PubMed
description In this paper, we propose a novel target-aware token design for transformer-based object detection. To tackle the target attribute diffusion challenge of transformer-based object detection, we propose two key components in the new target-aware token design mechanism. Firstly, we propose a target-aware sampling module, which forces the sampling patterns to converge inside the target region and obtain its representative encoded features. More specifically, a set of four sampling patterns are designed, including small and large patterns, which focus on the detailed and overall characteristics of a target, respectively, as well as the vertical and horizontal patterns, which handle the object’s directional structures. Secondly, we propose a target-aware key-value matrix. This is a unified, learnable, feature-embedding matrix which is directly weighted on the feature map to reduce the interference of non-target regions. With such a new design, we propose a new variant of the transformer-based object-detection model, called Focal DETR, which achieves superior performance over the state-of-the-art transformer-based object-detection models on the COCO object-detection benchmark dataset. Experimental results demonstrate that our Focal DETR achieves a 44.7 AP in the coco2017 test set, which is 2.7 AP and 0.9 AP higher than the DETR and deformable DETR using the same training strategy and the same feature-extraction network.
format Online
Article
Text
id pubmed-9699219
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-96992192022-11-26 Focal DETR: Target-Aware Token Design for Transformer-Based Object Detection Xie, Tianming Zhang, Zhonghao Tian, Jing Ma, Lihong Sensors (Basel) Article In this paper, we propose a novel target-aware token design for transformer-based object detection. To tackle the target attribute diffusion challenge of transformer-based object detection, we propose two key components in the new target-aware token design mechanism. Firstly, we propose a target-aware sampling module, which forces the sampling patterns to converge inside the target region and obtain its representative encoded features. More specifically, a set of four sampling patterns are designed, including small and large patterns, which focus on the detailed and overall characteristics of a target, respectively, as well as the vertical and horizontal patterns, which handle the object’s directional structures. Secondly, we propose a target-aware key-value matrix. This is a unified, learnable, feature-embedding matrix which is directly weighted on the feature map to reduce the interference of non-target regions. With such a new design, we propose a new variant of the transformer-based object-detection model, called Focal DETR, which achieves superior performance over the state-of-the-art transformer-based object-detection models on the COCO object-detection benchmark dataset. Experimental results demonstrate that our Focal DETR achieves a 44.7 AP in the coco2017 test set, which is 2.7 AP and 0.9 AP higher than the DETR and deformable DETR using the same training strategy and the same feature-extraction network. MDPI 2022-11-10 /pmc/articles/PMC9699219/ /pubmed/36433282 http://dx.doi.org/10.3390/s22228686 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Xie, Tianming
Zhang, Zhonghao
Tian, Jing
Ma, Lihong
Focal DETR: Target-Aware Token Design for Transformer-Based Object Detection
title Focal DETR: Target-Aware Token Design for Transformer-Based Object Detection
title_full Focal DETR: Target-Aware Token Design for Transformer-Based Object Detection
title_fullStr Focal DETR: Target-Aware Token Design for Transformer-Based Object Detection
title_full_unstemmed Focal DETR: Target-Aware Token Design for Transformer-Based Object Detection
title_short Focal DETR: Target-Aware Token Design for Transformer-Based Object Detection
title_sort focal detr: target-aware token design for transformer-based object detection
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9699219/
https://www.ncbi.nlm.nih.gov/pubmed/36433282
http://dx.doi.org/10.3390/s22228686
work_keys_str_mv AT xietianming focaldetrtargetawaretokendesignfortransformerbasedobjectdetection
AT zhangzhonghao focaldetrtargetawaretokendesignfortransformerbasedobjectdetection
AT tianjing focaldetrtargetawaretokendesignfortransformerbasedobjectdetection
AT malihong focaldetrtargetawaretokendesignfortransformerbasedobjectdetection