Cargando…

CTT: CNN Meets Transformer for Tracking

Siamese networks are one of the most popular directions in the visual object tracking based on deep learning. In Siamese networks, the feature pyramid network (FPN) and the cross-correlation complete feature fusion and the matching of features extracted from the template and search branch, respectiv...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yang, Chen, Zhang, Ximing, Song, Zongxi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9105974/ https://www.ncbi.nlm.nih.gov/pubmed/35590900 http://dx.doi.org/10.3390/s22093210

_version_	1784708169321152512
author	Yang, Chen Zhang, Ximing Song, Zongxi
author_facet	Yang, Chen Zhang, Ximing Song, Zongxi
author_sort	Yang, Chen
collection	PubMed
description	Siamese networks are one of the most popular directions in the visual object tracking based on deep learning. In Siamese networks, the feature pyramid network (FPN) and the cross-correlation complete feature fusion and the matching of features extracted from the template and search branch, respectively. However, object tracking should focus on the global and contextual dependencies. Hence, we introduce a delicate residual transformer structure which contains a self-attention mechanism called encoder-decoder into our tracker as the part of neck. Under the encoder-decoder structure, the encoder promotes the interaction between the low-level features extracted from the target and search branch by the CNN to obtain global attention information, while the decoder replaces cross-correlation to send global attention information into the head module. We add a spatial and channel attention component in the target branch, which can further improve the accuracy and robustness of our proposed model for a low price. Finally, we detailly evaluate our tracker CTT on GOT-10k, VOT2019, OTB-100, LaSOT, NfS, UAV123 and TrackingNet benchmarks, and our proposed method obtains competitive results with the state-of-the-art algorithms.
format	Online Article Text
id	pubmed-9105974
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-91059742022-05-14 CTT: CNN Meets Transformer for Tracking Yang, Chen Zhang, Ximing Song, Zongxi Sensors (Basel) Article Siamese networks are one of the most popular directions in the visual object tracking based on deep learning. In Siamese networks, the feature pyramid network (FPN) and the cross-correlation complete feature fusion and the matching of features extracted from the template and search branch, respectively. However, object tracking should focus on the global and contextual dependencies. Hence, we introduce a delicate residual transformer structure which contains a self-attention mechanism called encoder-decoder into our tracker as the part of neck. Under the encoder-decoder structure, the encoder promotes the interaction between the low-level features extracted from the target and search branch by the CNN to obtain global attention information, while the decoder replaces cross-correlation to send global attention information into the head module. We add a spatial and channel attention component in the target branch, which can further improve the accuracy and robustness of our proposed model for a low price. Finally, we detailly evaluate our tracker CTT on GOT-10k, VOT2019, OTB-100, LaSOT, NfS, UAV123 and TrackingNet benchmarks, and our proposed method obtains competitive results with the state-of-the-art algorithms. MDPI 2022-04-22 /pmc/articles/PMC9105974/ /pubmed/35590900 http://dx.doi.org/10.3390/s22093210 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Yang, Chen Zhang, Ximing Song, Zongxi CTT: CNN Meets Transformer for Tracking
title	CTT: CNN Meets Transformer for Tracking
title_full	CTT: CNN Meets Transformer for Tracking
title_fullStr	CTT: CNN Meets Transformer for Tracking
title_full_unstemmed	CTT: CNN Meets Transformer for Tracking
title_short	CTT: CNN Meets Transformer for Tracking
title_sort	ctt: cnn meets transformer for tracking
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9105974/ https://www.ncbi.nlm.nih.gov/pubmed/35590900 http://dx.doi.org/10.3390/s22093210
work_keys_str_mv	AT yangchen cttcnnmeetstransformerfortracking AT zhangximing cttcnnmeetstransformerfortracking AT songzongxi cttcnnmeetstransformerfortracking

CTT: CNN Meets Transformer for Tracking

Ejemplares similares