Cargando…
Focal DETR: Target-Aware Token Design for Transformer-Based Object Detection
In this paper, we propose a novel target-aware token design for transformer-based object detection. To tackle the target attribute diffusion challenge of transformer-based object detection, we propose two key components in the new target-aware token design mechanism. Firstly, we propose a target-awa...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9699219/ https://www.ncbi.nlm.nih.gov/pubmed/36433282 http://dx.doi.org/10.3390/s22228686 |
_version_ | 1784839017934618624 |
---|---|
author | Xie, Tianming Zhang, Zhonghao Tian, Jing Ma, Lihong |
author_facet | Xie, Tianming Zhang, Zhonghao Tian, Jing Ma, Lihong |
author_sort | Xie, Tianming |
collection | PubMed |
description | In this paper, we propose a novel target-aware token design for transformer-based object detection. To tackle the target attribute diffusion challenge of transformer-based object detection, we propose two key components in the new target-aware token design mechanism. Firstly, we propose a target-aware sampling module, which forces the sampling patterns to converge inside the target region and obtain its representative encoded features. More specifically, a set of four sampling patterns are designed, including small and large patterns, which focus on the detailed and overall characteristics of a target, respectively, as well as the vertical and horizontal patterns, which handle the object’s directional structures. Secondly, we propose a target-aware key-value matrix. This is a unified, learnable, feature-embedding matrix which is directly weighted on the feature map to reduce the interference of non-target regions. With such a new design, we propose a new variant of the transformer-based object-detection model, called Focal DETR, which achieves superior performance over the state-of-the-art transformer-based object-detection models on the COCO object-detection benchmark dataset. Experimental results demonstrate that our Focal DETR achieves a 44.7 AP in the coco2017 test set, which is 2.7 AP and 0.9 AP higher than the DETR and deformable DETR using the same training strategy and the same feature-extraction network. |
format | Online Article Text |
id | pubmed-9699219 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-96992192022-11-26 Focal DETR: Target-Aware Token Design for Transformer-Based Object Detection Xie, Tianming Zhang, Zhonghao Tian, Jing Ma, Lihong Sensors (Basel) Article In this paper, we propose a novel target-aware token design for transformer-based object detection. To tackle the target attribute diffusion challenge of transformer-based object detection, we propose two key components in the new target-aware token design mechanism. Firstly, we propose a target-aware sampling module, which forces the sampling patterns to converge inside the target region and obtain its representative encoded features. More specifically, a set of four sampling patterns are designed, including small and large patterns, which focus on the detailed and overall characteristics of a target, respectively, as well as the vertical and horizontal patterns, which handle the object’s directional structures. Secondly, we propose a target-aware key-value matrix. This is a unified, learnable, feature-embedding matrix which is directly weighted on the feature map to reduce the interference of non-target regions. With such a new design, we propose a new variant of the transformer-based object-detection model, called Focal DETR, which achieves superior performance over the state-of-the-art transformer-based object-detection models on the COCO object-detection benchmark dataset. Experimental results demonstrate that our Focal DETR achieves a 44.7 AP in the coco2017 test set, which is 2.7 AP and 0.9 AP higher than the DETR and deformable DETR using the same training strategy and the same feature-extraction network. MDPI 2022-11-10 /pmc/articles/PMC9699219/ /pubmed/36433282 http://dx.doi.org/10.3390/s22228686 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Xie, Tianming Zhang, Zhonghao Tian, Jing Ma, Lihong Focal DETR: Target-Aware Token Design for Transformer-Based Object Detection |
title | Focal DETR: Target-Aware Token Design for Transformer-Based Object Detection |
title_full | Focal DETR: Target-Aware Token Design for Transformer-Based Object Detection |
title_fullStr | Focal DETR: Target-Aware Token Design for Transformer-Based Object Detection |
title_full_unstemmed | Focal DETR: Target-Aware Token Design for Transformer-Based Object Detection |
title_short | Focal DETR: Target-Aware Token Design for Transformer-Based Object Detection |
title_sort | focal detr: target-aware token design for transformer-based object detection |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9699219/ https://www.ncbi.nlm.nih.gov/pubmed/36433282 http://dx.doi.org/10.3390/s22228686 |
work_keys_str_mv | AT xietianming focaldetrtargetawaretokendesignfortransformerbasedobjectdetection AT zhangzhonghao focaldetrtargetawaretokendesignfortransformerbasedobjectdetection AT tianjing focaldetrtargetawaretokendesignfortransformerbasedobjectdetection AT malihong focaldetrtargetawaretokendesignfortransformerbasedobjectdetection |