Cargando…

A Hierarchical Spatial–Temporal Cross-Attention Scheme for Video Summarization Using Contrastive Learning

Video summarization (VS) is a widely used technique for facilitating the effective reading, fast comprehension, and effective retrieval of video content. Certain properties of the new video data, such as a lack of prominent emphasis and a fuzzy theme development border, disturb the original thinking...

Descripción completa

Detalles Bibliográficos
Autores principales: Teng, Xiaoyu, Gui, Xiaolin, Xu, Pan, Tong, Jianglei, An, Jian, Liu, Yang, Jiang, Huilan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9657952/
https://www.ncbi.nlm.nih.gov/pubmed/36365972
http://dx.doi.org/10.3390/s22218275
_version_ 1784829825012203520
author Teng, Xiaoyu
Gui, Xiaolin
Xu, Pan
Tong, Jianglei
An, Jian
Liu, Yang
Jiang, Huilan
author_facet Teng, Xiaoyu
Gui, Xiaolin
Xu, Pan
Tong, Jianglei
An, Jian
Liu, Yang
Jiang, Huilan
author_sort Teng, Xiaoyu
collection PubMed
description Video summarization (VS) is a widely used technique for facilitating the effective reading, fast comprehension, and effective retrieval of video content. Certain properties of the new video data, such as a lack of prominent emphasis and a fuzzy theme development border, disturb the original thinking mode based on video feature information. Moreover, it introduces new challenges to the extraction of video depth and breadth features. In addition, the diversity of user requirements creates additional complications for more accurate keyframe screening issues. To overcome these challenges, this paper proposes a hierarchical spatial–temporal cross-attention scheme for video summarization based on comparative learning. Graph attention networks (GAT) and the multi-head convolutional attention cell are used to extract local and depth features, while the GAT-adjusted bidirection ConvLSTM (DB-ConvLSTM) is used to extract global and breadth features. Furthermore, a spatial–temporal cross-attention-based ConvLSTM is developed for merging hierarchical characteristics and achieving more accurate screening in similar keyframes clusters. Verification experiments and comparative analysis demonstrate that our method outperforms state-of-the-art methods.
format Online
Article
Text
id pubmed-9657952
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-96579522022-11-15 A Hierarchical Spatial–Temporal Cross-Attention Scheme for Video Summarization Using Contrastive Learning Teng, Xiaoyu Gui, Xiaolin Xu, Pan Tong, Jianglei An, Jian Liu, Yang Jiang, Huilan Sensors (Basel) Article Video summarization (VS) is a widely used technique for facilitating the effective reading, fast comprehension, and effective retrieval of video content. Certain properties of the new video data, such as a lack of prominent emphasis and a fuzzy theme development border, disturb the original thinking mode based on video feature information. Moreover, it introduces new challenges to the extraction of video depth and breadth features. In addition, the diversity of user requirements creates additional complications for more accurate keyframe screening issues. To overcome these challenges, this paper proposes a hierarchical spatial–temporal cross-attention scheme for video summarization based on comparative learning. Graph attention networks (GAT) and the multi-head convolutional attention cell are used to extract local and depth features, while the GAT-adjusted bidirection ConvLSTM (DB-ConvLSTM) is used to extract global and breadth features. Furthermore, a spatial–temporal cross-attention-based ConvLSTM is developed for merging hierarchical characteristics and achieving more accurate screening in similar keyframes clusters. Verification experiments and comparative analysis demonstrate that our method outperforms state-of-the-art methods. MDPI 2022-10-28 /pmc/articles/PMC9657952/ /pubmed/36365972 http://dx.doi.org/10.3390/s22218275 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Teng, Xiaoyu
Gui, Xiaolin
Xu, Pan
Tong, Jianglei
An, Jian
Liu, Yang
Jiang, Huilan
A Hierarchical Spatial–Temporal Cross-Attention Scheme for Video Summarization Using Contrastive Learning
title A Hierarchical Spatial–Temporal Cross-Attention Scheme for Video Summarization Using Contrastive Learning
title_full A Hierarchical Spatial–Temporal Cross-Attention Scheme for Video Summarization Using Contrastive Learning
title_fullStr A Hierarchical Spatial–Temporal Cross-Attention Scheme for Video Summarization Using Contrastive Learning
title_full_unstemmed A Hierarchical Spatial–Temporal Cross-Attention Scheme for Video Summarization Using Contrastive Learning
title_short A Hierarchical Spatial–Temporal Cross-Attention Scheme for Video Summarization Using Contrastive Learning
title_sort hierarchical spatial–temporal cross-attention scheme for video summarization using contrastive learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9657952/
https://www.ncbi.nlm.nih.gov/pubmed/36365972
http://dx.doi.org/10.3390/s22218275
work_keys_str_mv AT tengxiaoyu ahierarchicalspatialtemporalcrossattentionschemeforvideosummarizationusingcontrastivelearning
AT guixiaolin ahierarchicalspatialtemporalcrossattentionschemeforvideosummarizationusingcontrastivelearning
AT xupan ahierarchicalspatialtemporalcrossattentionschemeforvideosummarizationusingcontrastivelearning
AT tongjianglei ahierarchicalspatialtemporalcrossattentionschemeforvideosummarizationusingcontrastivelearning
AT anjian ahierarchicalspatialtemporalcrossattentionschemeforvideosummarizationusingcontrastivelearning
AT liuyang ahierarchicalspatialtemporalcrossattentionschemeforvideosummarizationusingcontrastivelearning
AT jianghuilan ahierarchicalspatialtemporalcrossattentionschemeforvideosummarizationusingcontrastivelearning
AT tengxiaoyu hierarchicalspatialtemporalcrossattentionschemeforvideosummarizationusingcontrastivelearning
AT guixiaolin hierarchicalspatialtemporalcrossattentionschemeforvideosummarizationusingcontrastivelearning
AT xupan hierarchicalspatialtemporalcrossattentionschemeforvideosummarizationusingcontrastivelearning
AT tongjianglei hierarchicalspatialtemporalcrossattentionschemeforvideosummarizationusingcontrastivelearning
AT anjian hierarchicalspatialtemporalcrossattentionschemeforvideosummarizationusingcontrastivelearning
AT liuyang hierarchicalspatialtemporalcrossattentionschemeforvideosummarizationusingcontrastivelearning
AT jianghuilan hierarchicalspatialtemporalcrossattentionschemeforvideosummarizationusingcontrastivelearning