Cargando…

Multi-Level Fusion Temporal–Spatial Co-Attention for Video-Based Person Re-Identification

A convolutional neural network can easily fall into local minima for insufficient data, and the needed training is unstable. Many current methods are used to solve these problems by adding pedestrian attributes, pedestrian postures, and other auxiliary information, but they require additional collec...

Descripción completa

Detalles Bibliográficos
Autores principales:	Pei, Shengyu, Fan, Xiaoping
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8700156/ https://www.ncbi.nlm.nih.gov/pubmed/34945992 http://dx.doi.org/10.3390/e23121686

_version_	1784620688769810432
author	Pei, Shengyu Fan, Xiaoping
author_facet	Pei, Shengyu Fan, Xiaoping
author_sort	Pei, Shengyu
collection	PubMed
description	A convolutional neural network can easily fall into local minima for insufficient data, and the needed training is unstable. Many current methods are used to solve these problems by adding pedestrian attributes, pedestrian postures, and other auxiliary information, but they require additional collection, which is time-consuming and laborious. Every video sequence frame has a different degree of similarity. In this paper, multi-level fusion temporal–spatial co-attention is adopted to improve person re-identification (reID). For a small dataset, the improved network can better prevent over-fitting and reduce the dataset limit. Specifically, the concept of knowledge evolution is introduced into video-based person re-identification to improve the backbone residual neural network (ResNet). The global branch, local branch, and attention branch are used in parallel for feature extraction. Three high-level features are embedded in the metric learning network to improve the network’s generalization ability and the accuracy of video-based person re-identification. Simulation experiments are implemented on small datasets PRID2011 and iLIDS-VID, and the improved network can better prevent over-fitting. Experiments are also implemented on MARS and DukeMTMC-VideoReID, and the proposed method can be used to extract more feature information and improve the network’s generalization ability. The results show that our method achieves better performance. The model achieves 90.15% Rank1 and 81.91% mAP on MARS.
format	Online Article Text
id	pubmed-8700156
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-87001562021-12-24 Multi-Level Fusion Temporal–Spatial Co-Attention for Video-Based Person Re-Identification Pei, Shengyu Fan, Xiaoping Entropy (Basel) Article A convolutional neural network can easily fall into local minima for insufficient data, and the needed training is unstable. Many current methods are used to solve these problems by adding pedestrian attributes, pedestrian postures, and other auxiliary information, but they require additional collection, which is time-consuming and laborious. Every video sequence frame has a different degree of similarity. In this paper, multi-level fusion temporal–spatial co-attention is adopted to improve person re-identification (reID). For a small dataset, the improved network can better prevent over-fitting and reduce the dataset limit. Specifically, the concept of knowledge evolution is introduced into video-based person re-identification to improve the backbone residual neural network (ResNet). The global branch, local branch, and attention branch are used in parallel for feature extraction. Three high-level features are embedded in the metric learning network to improve the network’s generalization ability and the accuracy of video-based person re-identification. Simulation experiments are implemented on small datasets PRID2011 and iLIDS-VID, and the improved network can better prevent over-fitting. Experiments are also implemented on MARS and DukeMTMC-VideoReID, and the proposed method can be used to extract more feature information and improve the network’s generalization ability. The results show that our method achieves better performance. The model achieves 90.15% Rank1 and 81.91% mAP on MARS. MDPI 2021-12-15 /pmc/articles/PMC8700156/ /pubmed/34945992 http://dx.doi.org/10.3390/e23121686 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Pei, Shengyu Fan, Xiaoping Multi-Level Fusion Temporal–Spatial Co-Attention for Video-Based Person Re-Identification
title	Multi-Level Fusion Temporal–Spatial Co-Attention for Video-Based Person Re-Identification
title_full	Multi-Level Fusion Temporal–Spatial Co-Attention for Video-Based Person Re-Identification
title_fullStr	Multi-Level Fusion Temporal–Spatial Co-Attention for Video-Based Person Re-Identification
title_full_unstemmed	Multi-Level Fusion Temporal–Spatial Co-Attention for Video-Based Person Re-Identification
title_short	Multi-Level Fusion Temporal–Spatial Co-Attention for Video-Based Person Re-Identification
title_sort	multi-level fusion temporal–spatial co-attention for video-based person re-identification
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8700156/ https://www.ncbi.nlm.nih.gov/pubmed/34945992 http://dx.doi.org/10.3390/e23121686
work_keys_str_mv	AT peishengyu multilevelfusiontemporalspatialcoattentionforvideobasedpersonreidentification AT fanxiaoping multilevelfusiontemporalspatialcoattentionforvideobasedpersonreidentification

Multi-Level Fusion Temporal–Spatial Co-Attention for Video-Based Person Re-Identification

Ejemplares similares