Cargando…
Learning Modality Complementary Features with Mixed Attention Mechanism for RGB-T Tracking
RGB-T tracking involves the use of images from both visible and thermal modalities. The primary objective is to adaptively leverage the relatively dominant modality in varying conditions to achieve more robust tracking compared to single-modality tracking. An RGB-T tracker based on a mixed-attention...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10384326/ https://www.ncbi.nlm.nih.gov/pubmed/37514902 http://dx.doi.org/10.3390/s23146609 |
_version_ | 1785081130001629184 |
---|---|
author | Luo, Yang Guo, Xiqing Dong, Mingtao Yu, Jin |
author_facet | Luo, Yang Guo, Xiqing Dong, Mingtao Yu, Jin |
author_sort | Luo, Yang |
collection | PubMed |
description | RGB-T tracking involves the use of images from both visible and thermal modalities. The primary objective is to adaptively leverage the relatively dominant modality in varying conditions to achieve more robust tracking compared to single-modality tracking. An RGB-T tracker based on a mixed-attention mechanism to achieve a complementary fusion of modalities (referred to as MACFT) is proposed in this paper. In the feature extraction stage, we utilize different transformer backbone branches to extract specific and shared information from different modalities. By performing mixed-attention operations in the backbone to enable information interaction and self-enhancement between the template and search images, a robust feature representation is constructed that better understands the high-level semantic features of the target. Then, in the feature fusion stage, a modality shared-specific feature interaction structure was designed based on a mixed-attention mechanism, effectively suppressing low-quality modality noise while enhancing the information from the dominant modality. Evaluation on multiple RGB-T public datasets demonstrates that our proposed tracker outperforms other RGB-T trackers on general evaluation metrics while also being able to adapt to long-term tracking scenarios. |
format | Online Article Text |
id | pubmed-10384326 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-103843262023-07-30 Learning Modality Complementary Features with Mixed Attention Mechanism for RGB-T Tracking Luo, Yang Guo, Xiqing Dong, Mingtao Yu, Jin Sensors (Basel) Article RGB-T tracking involves the use of images from both visible and thermal modalities. The primary objective is to adaptively leverage the relatively dominant modality in varying conditions to achieve more robust tracking compared to single-modality tracking. An RGB-T tracker based on a mixed-attention mechanism to achieve a complementary fusion of modalities (referred to as MACFT) is proposed in this paper. In the feature extraction stage, we utilize different transformer backbone branches to extract specific and shared information from different modalities. By performing mixed-attention operations in the backbone to enable information interaction and self-enhancement between the template and search images, a robust feature representation is constructed that better understands the high-level semantic features of the target. Then, in the feature fusion stage, a modality shared-specific feature interaction structure was designed based on a mixed-attention mechanism, effectively suppressing low-quality modality noise while enhancing the information from the dominant modality. Evaluation on multiple RGB-T public datasets demonstrates that our proposed tracker outperforms other RGB-T trackers on general evaluation metrics while also being able to adapt to long-term tracking scenarios. MDPI 2023-07-22 /pmc/articles/PMC10384326/ /pubmed/37514902 http://dx.doi.org/10.3390/s23146609 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Luo, Yang Guo, Xiqing Dong, Mingtao Yu, Jin Learning Modality Complementary Features with Mixed Attention Mechanism for RGB-T Tracking |
title | Learning Modality Complementary Features with Mixed Attention Mechanism for RGB-T Tracking |
title_full | Learning Modality Complementary Features with Mixed Attention Mechanism for RGB-T Tracking |
title_fullStr | Learning Modality Complementary Features with Mixed Attention Mechanism for RGB-T Tracking |
title_full_unstemmed | Learning Modality Complementary Features with Mixed Attention Mechanism for RGB-T Tracking |
title_short | Learning Modality Complementary Features with Mixed Attention Mechanism for RGB-T Tracking |
title_sort | learning modality complementary features with mixed attention mechanism for rgb-t tracking |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10384326/ https://www.ncbi.nlm.nih.gov/pubmed/37514902 http://dx.doi.org/10.3390/s23146609 |
work_keys_str_mv | AT luoyang learningmodalitycomplementaryfeatureswithmixedattentionmechanismforrgbttracking AT guoxiqing learningmodalitycomplementaryfeatureswithmixedattentionmechanismforrgbttracking AT dongmingtao learningmodalitycomplementaryfeatureswithmixedattentionmechanismforrgbttracking AT yujin learningmodalitycomplementaryfeatureswithmixedattentionmechanismforrgbttracking |