Cargando…

CTT: CNN Meets Transformer for Tracking

Siamese networks are one of the most popular directions in the visual object tracking based on deep learning. In Siamese networks, the feature pyramid network (FPN) and the cross-correlation complete feature fusion and the matching of features extracted from the template and search branch, respectiv...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Chen, Zhang, Ximing, Song, Zongxi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9105974/
https://www.ncbi.nlm.nih.gov/pubmed/35590900
http://dx.doi.org/10.3390/s22093210
_version_ 1784708169321152512
author Yang, Chen
Zhang, Ximing
Song, Zongxi
author_facet Yang, Chen
Zhang, Ximing
Song, Zongxi
author_sort Yang, Chen
collection PubMed
description Siamese networks are one of the most popular directions in the visual object tracking based on deep learning. In Siamese networks, the feature pyramid network (FPN) and the cross-correlation complete feature fusion and the matching of features extracted from the template and search branch, respectively. However, object tracking should focus on the global and contextual dependencies. Hence, we introduce a delicate residual transformer structure which contains a self-attention mechanism called encoder-decoder into our tracker as the part of neck. Under the encoder-decoder structure, the encoder promotes the interaction between the low-level features extracted from the target and search branch by the CNN to obtain global attention information, while the decoder replaces cross-correlation to send global attention information into the head module. We add a spatial and channel attention component in the target branch, which can further improve the accuracy and robustness of our proposed model for a low price. Finally, we detailly evaluate our tracker CTT on GOT-10k, VOT2019, OTB-100, LaSOT, NfS, UAV123 and TrackingNet benchmarks, and our proposed method obtains competitive results with the state-of-the-art algorithms.
format Online
Article
Text
id pubmed-9105974
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-91059742022-05-14 CTT: CNN Meets Transformer for Tracking Yang, Chen Zhang, Ximing Song, Zongxi Sensors (Basel) Article Siamese networks are one of the most popular directions in the visual object tracking based on deep learning. In Siamese networks, the feature pyramid network (FPN) and the cross-correlation complete feature fusion and the matching of features extracted from the template and search branch, respectively. However, object tracking should focus on the global and contextual dependencies. Hence, we introduce a delicate residual transformer structure which contains a self-attention mechanism called encoder-decoder into our tracker as the part of neck. Under the encoder-decoder structure, the encoder promotes the interaction between the low-level features extracted from the target and search branch by the CNN to obtain global attention information, while the decoder replaces cross-correlation to send global attention information into the head module. We add a spatial and channel attention component in the target branch, which can further improve the accuracy and robustness of our proposed model for a low price. Finally, we detailly evaluate our tracker CTT on GOT-10k, VOT2019, OTB-100, LaSOT, NfS, UAV123 and TrackingNet benchmarks, and our proposed method obtains competitive results with the state-of-the-art algorithms. MDPI 2022-04-22 /pmc/articles/PMC9105974/ /pubmed/35590900 http://dx.doi.org/10.3390/s22093210 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Yang, Chen
Zhang, Ximing
Song, Zongxi
CTT: CNN Meets Transformer for Tracking
title CTT: CNN Meets Transformer for Tracking
title_full CTT: CNN Meets Transformer for Tracking
title_fullStr CTT: CNN Meets Transformer for Tracking
title_full_unstemmed CTT: CNN Meets Transformer for Tracking
title_short CTT: CNN Meets Transformer for Tracking
title_sort ctt: cnn meets transformer for tracking
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9105974/
https://www.ncbi.nlm.nih.gov/pubmed/35590900
http://dx.doi.org/10.3390/s22093210
work_keys_str_mv AT yangchen cttcnnmeetstransformerfortracking
AT zhangximing cttcnnmeetstransformerfortracking
AT songzongxi cttcnnmeetstransformerfortracking