Cargando…
CTT: CNN Meets Transformer for Tracking
Siamese networks are one of the most popular directions in the visual object tracking based on deep learning. In Siamese networks, the feature pyramid network (FPN) and the cross-correlation complete feature fusion and the matching of features extracted from the template and search branch, respectiv...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9105974/ https://www.ncbi.nlm.nih.gov/pubmed/35590900 http://dx.doi.org/10.3390/s22093210 |
_version_ | 1784708169321152512 |
---|---|
author | Yang, Chen Zhang, Ximing Song, Zongxi |
author_facet | Yang, Chen Zhang, Ximing Song, Zongxi |
author_sort | Yang, Chen |
collection | PubMed |
description | Siamese networks are one of the most popular directions in the visual object tracking based on deep learning. In Siamese networks, the feature pyramid network (FPN) and the cross-correlation complete feature fusion and the matching of features extracted from the template and search branch, respectively. However, object tracking should focus on the global and contextual dependencies. Hence, we introduce a delicate residual transformer structure which contains a self-attention mechanism called encoder-decoder into our tracker as the part of neck. Under the encoder-decoder structure, the encoder promotes the interaction between the low-level features extracted from the target and search branch by the CNN to obtain global attention information, while the decoder replaces cross-correlation to send global attention information into the head module. We add a spatial and channel attention component in the target branch, which can further improve the accuracy and robustness of our proposed model for a low price. Finally, we detailly evaluate our tracker CTT on GOT-10k, VOT2019, OTB-100, LaSOT, NfS, UAV123 and TrackingNet benchmarks, and our proposed method obtains competitive results with the state-of-the-art algorithms. |
format | Online Article Text |
id | pubmed-9105974 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-91059742022-05-14 CTT: CNN Meets Transformer for Tracking Yang, Chen Zhang, Ximing Song, Zongxi Sensors (Basel) Article Siamese networks are one of the most popular directions in the visual object tracking based on deep learning. In Siamese networks, the feature pyramid network (FPN) and the cross-correlation complete feature fusion and the matching of features extracted from the template and search branch, respectively. However, object tracking should focus on the global and contextual dependencies. Hence, we introduce a delicate residual transformer structure which contains a self-attention mechanism called encoder-decoder into our tracker as the part of neck. Under the encoder-decoder structure, the encoder promotes the interaction between the low-level features extracted from the target and search branch by the CNN to obtain global attention information, while the decoder replaces cross-correlation to send global attention information into the head module. We add a spatial and channel attention component in the target branch, which can further improve the accuracy and robustness of our proposed model for a low price. Finally, we detailly evaluate our tracker CTT on GOT-10k, VOT2019, OTB-100, LaSOT, NfS, UAV123 and TrackingNet benchmarks, and our proposed method obtains competitive results with the state-of-the-art algorithms. MDPI 2022-04-22 /pmc/articles/PMC9105974/ /pubmed/35590900 http://dx.doi.org/10.3390/s22093210 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Yang, Chen Zhang, Ximing Song, Zongxi CTT: CNN Meets Transformer for Tracking |
title | CTT: CNN Meets Transformer for Tracking |
title_full | CTT: CNN Meets Transformer for Tracking |
title_fullStr | CTT: CNN Meets Transformer for Tracking |
title_full_unstemmed | CTT: CNN Meets Transformer for Tracking |
title_short | CTT: CNN Meets Transformer for Tracking |
title_sort | ctt: cnn meets transformer for tracking |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9105974/ https://www.ncbi.nlm.nih.gov/pubmed/35590900 http://dx.doi.org/10.3390/s22093210 |
work_keys_str_mv | AT yangchen cttcnnmeetstransformerfortracking AT zhangximing cttcnnmeetstransformerfortracking AT songzongxi cttcnnmeetstransformerfortracking |