Cargando…

HROM: Learning High-Resolution Representation and Object-Aware Masks for Visual Object Tracking

Siamese network-based trackers consider tracking as features cross-correlation between the target template and the search region. Therefore, feature representation plays an important role for constructing a high-performance tracker. However, all existing Siamese networks extract the deep but low-res...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Dawei, Zheng, Zhonglong, Wang, Tianxiang, He, Yiran
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7506602/ https://www.ncbi.nlm.nih.gov/pubmed/32858872 http://dx.doi.org/10.3390/s20174807

_version_	1783585051977449472
author	Zhang, Dawei Zheng, Zhonglong Wang, Tianxiang He, Yiran
author_facet	Zhang, Dawei Zheng, Zhonglong Wang, Tianxiang He, Yiran
author_sort	Zhang, Dawei
collection	PubMed
description	Siamese network-based trackers consider tracking as features cross-correlation between the target template and the search region. Therefore, feature representation plays an important role for constructing a high-performance tracker. However, all existing Siamese networks extract the deep but low-resolution features of the entire patch, which is not robust enough to estimate the target bounding box accurately. In this work, to address this issue, we propose a novel high-resolution Siamese network, which connects the high-to-low resolution convolution streams in parallel as well as repeatedly exchanges the information across resolutions to maintain high-resolution representations. The resulting representation is semantically richer and spatially more precise by a simple yet effective multi-scale feature fusion strategy. Moreover, we exploit attention mechanisms to learn object-aware masks for adaptive feature refinement, and use deformable convolution to handle complex geometric transformations. This makes the target more discriminative against distractors and background. Without bells and whistles, extensive experiments on popular tracking benchmarks containing OTB100, UAV123, VOT2018 and LaSOT demonstrate that the proposed tracker achieves state-of-the-art performance and runs in real time, confirming its efficiency and effectiveness.
format	Online Article Text
id	pubmed-7506602
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-75066022020-09-26 HROM: Learning High-Resolution Representation and Object-Aware Masks for Visual Object Tracking Zhang, Dawei Zheng, Zhonglong Wang, Tianxiang He, Yiran Sensors (Basel) Article Siamese network-based trackers consider tracking as features cross-correlation between the target template and the search region. Therefore, feature representation plays an important role for constructing a high-performance tracker. However, all existing Siamese networks extract the deep but low-resolution features of the entire patch, which is not robust enough to estimate the target bounding box accurately. In this work, to address this issue, we propose a novel high-resolution Siamese network, which connects the high-to-low resolution convolution streams in parallel as well as repeatedly exchanges the information across resolutions to maintain high-resolution representations. The resulting representation is semantically richer and spatially more precise by a simple yet effective multi-scale feature fusion strategy. Moreover, we exploit attention mechanisms to learn object-aware masks for adaptive feature refinement, and use deformable convolution to handle complex geometric transformations. This makes the target more discriminative against distractors and background. Without bells and whistles, extensive experiments on popular tracking benchmarks containing OTB100, UAV123, VOT2018 and LaSOT demonstrate that the proposed tracker achieves state-of-the-art performance and runs in real time, confirming its efficiency and effectiveness. MDPI 2020-08-26 /pmc/articles/PMC7506602/ /pubmed/32858872 http://dx.doi.org/10.3390/s20174807 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Zhang, Dawei Zheng, Zhonglong Wang, Tianxiang He, Yiran HROM: Learning High-Resolution Representation and Object-Aware Masks for Visual Object Tracking
title	HROM: Learning High-Resolution Representation and Object-Aware Masks for Visual Object Tracking
title_full	HROM: Learning High-Resolution Representation and Object-Aware Masks for Visual Object Tracking
title_fullStr	HROM: Learning High-Resolution Representation and Object-Aware Masks for Visual Object Tracking
title_full_unstemmed	HROM: Learning High-Resolution Representation and Object-Aware Masks for Visual Object Tracking
title_short	HROM: Learning High-Resolution Representation and Object-Aware Masks for Visual Object Tracking
title_sort	hrom: learning high-resolution representation and object-aware masks for visual object tracking
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7506602/ https://www.ncbi.nlm.nih.gov/pubmed/32858872 http://dx.doi.org/10.3390/s20174807
work_keys_str_mv	AT zhangdawei hromlearninghighresolutionrepresentationandobjectawaremasksforvisualobjecttracking AT zhengzhonglong hromlearninghighresolutionrepresentationandobjectawaremasksforvisualobjecttracking AT wangtianxiang hromlearninghighresolutionrepresentationandobjectawaremasksforvisualobjecttracking AT heyiran hromlearninghighresolutionrepresentationandobjectawaremasksforvisualobjecttracking

HROM: Learning High-Resolution Representation and Object-Aware Masks for Visual Object Tracking

Ejemplares similares