Cargando…
A Robust Visual Tracking Method Based on Reconstruction Patch Transformer Tracking
Recently, the transformer model has progressed from the field of visual classification to target tracking. Its primary method replaces the cross-correlation operation in the Siamese tracker. The backbone of the network is still a convolutional neural network (CNN). However, the existing transformer-...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9460596/ https://www.ncbi.nlm.nih.gov/pubmed/36081017 http://dx.doi.org/10.3390/s22176558 |
Sumario: | Recently, the transformer model has progressed from the field of visual classification to target tracking. Its primary method replaces the cross-correlation operation in the Siamese tracker. The backbone of the network is still a convolutional neural network (CNN). However, the existing transformer-based tracker simply deforms the features extracted by the CNN into patches and feeds them into the transformer encoder. Each patch contains a single element of the spatial dimension of the extracted features and inputs into the transformer structure to use cross-attention instead of cross-correlation operations. This paper proposes a reconstruction patch strategy which combines the extracted features with multiple elements of the spatial dimension into a new patch. The reconstruction operation has the following advantages: (1) the correlation between adjacent elements combines well, and the features extracted by the CNN are usable for classification and regression; (2) using the performer operation reduces the amount of network computation and the dimension of the patch sent to the transformer, thereby sharply reducing the network parameters and improving the model-tracking speed. |
---|