Cargando…

Cascading Alignment for Unsupervised Domain-Adaptive DETR with Improved DeNoising Anchor Boxes

Transformer-based object detection has recently attracted increasing interest and shown promising results. As one of the DETR-like models, DETR with improved denoising anchor boxes (DINO) produced superior performance on COCO val2017 and achieved a new state of the art. However, it often encounters...

Descripción completa

Detalles Bibliográficos
Autores principales: Geng, Huantong, Jiang, Jun, Shen, Junye, Hou, Mengmeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9783326/
https://www.ncbi.nlm.nih.gov/pubmed/36560000
http://dx.doi.org/10.3390/s22249629
_version_ 1784857551847817216
author Geng, Huantong
Jiang, Jun
Shen, Junye
Hou, Mengmeng
author_facet Geng, Huantong
Jiang, Jun
Shen, Junye
Hou, Mengmeng
author_sort Geng, Huantong
collection PubMed
description Transformer-based object detection has recently attracted increasing interest and shown promising results. As one of the DETR-like models, DETR with improved denoising anchor boxes (DINO) produced superior performance on COCO val2017 and achieved a new state of the art. However, it often encounters challenges when applied to new scenarios where no annotated data is available, and the imaging conditions differ significantly. To alleviate this problem of domain shift, in this paper, unsupervised domain adaptive DINO via cascading alignment (CA-DINO) was proposed, which consists of attention-enhanced double discriminators (AEDD) and weak-restraints on category-level token (WROT). Specifically, AEDD is used to aggregate and align the local–global context from the feature representations of both domains while reducing the domain discrepancy before entering the transformer encoder and decoder. WROT extends Deep CORAL loss to adapt class tokens after embedding, minimizing the difference in second-order statistics between the source and target domain. Our approach is trained end to end, and experiments on two challenging benchmarks demonstrate the effectiveness of our method, which yields 41% relative improvement compared to baseline on the benchmark dataset Foggy Cityscapes, in particular.
format Online
Article
Text
id pubmed-9783326
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-97833262022-12-24 Cascading Alignment for Unsupervised Domain-Adaptive DETR with Improved DeNoising Anchor Boxes Geng, Huantong Jiang, Jun Shen, Junye Hou, Mengmeng Sensors (Basel) Article Transformer-based object detection has recently attracted increasing interest and shown promising results. As one of the DETR-like models, DETR with improved denoising anchor boxes (DINO) produced superior performance on COCO val2017 and achieved a new state of the art. However, it often encounters challenges when applied to new scenarios where no annotated data is available, and the imaging conditions differ significantly. To alleviate this problem of domain shift, in this paper, unsupervised domain adaptive DINO via cascading alignment (CA-DINO) was proposed, which consists of attention-enhanced double discriminators (AEDD) and weak-restraints on category-level token (WROT). Specifically, AEDD is used to aggregate and align the local–global context from the feature representations of both domains while reducing the domain discrepancy before entering the transformer encoder and decoder. WROT extends Deep CORAL loss to adapt class tokens after embedding, minimizing the difference in second-order statistics between the source and target domain. Our approach is trained end to end, and experiments on two challenging benchmarks demonstrate the effectiveness of our method, which yields 41% relative improvement compared to baseline on the benchmark dataset Foggy Cityscapes, in particular. MDPI 2022-12-08 /pmc/articles/PMC9783326/ /pubmed/36560000 http://dx.doi.org/10.3390/s22249629 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Geng, Huantong
Jiang, Jun
Shen, Junye
Hou, Mengmeng
Cascading Alignment for Unsupervised Domain-Adaptive DETR with Improved DeNoising Anchor Boxes
title Cascading Alignment for Unsupervised Domain-Adaptive DETR with Improved DeNoising Anchor Boxes
title_full Cascading Alignment for Unsupervised Domain-Adaptive DETR with Improved DeNoising Anchor Boxes
title_fullStr Cascading Alignment for Unsupervised Domain-Adaptive DETR with Improved DeNoising Anchor Boxes
title_full_unstemmed Cascading Alignment for Unsupervised Domain-Adaptive DETR with Improved DeNoising Anchor Boxes
title_short Cascading Alignment for Unsupervised Domain-Adaptive DETR with Improved DeNoising Anchor Boxes
title_sort cascading alignment for unsupervised domain-adaptive detr with improved denoising anchor boxes
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9783326/
https://www.ncbi.nlm.nih.gov/pubmed/36560000
http://dx.doi.org/10.3390/s22249629
work_keys_str_mv AT genghuantong cascadingalignmentforunsuperviseddomainadaptivedetrwithimproveddenoisinganchorboxes
AT jiangjun cascadingalignmentforunsuperviseddomainadaptivedetrwithimproveddenoisinganchorboxes
AT shenjunye cascadingalignmentforunsuperviseddomainadaptivedetrwithimproveddenoisinganchorboxes
AT houmengmeng cascadingalignmentforunsuperviseddomainadaptivedetrwithimproveddenoisinganchorboxes