Cargando…
A Hierarchical Spatial–Temporal Cross-Attention Scheme for Video Summarization Using Contrastive Learning
Video summarization (VS) is a widely used technique for facilitating the effective reading, fast comprehension, and effective retrieval of video content. Certain properties of the new video data, such as a lack of prominent emphasis and a fuzzy theme development border, disturb the original thinking...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9657952/ https://www.ncbi.nlm.nih.gov/pubmed/36365972 http://dx.doi.org/10.3390/s22218275 |
_version_ | 1784829825012203520 |
---|---|
author | Teng, Xiaoyu Gui, Xiaolin Xu, Pan Tong, Jianglei An, Jian Liu, Yang Jiang, Huilan |
author_facet | Teng, Xiaoyu Gui, Xiaolin Xu, Pan Tong, Jianglei An, Jian Liu, Yang Jiang, Huilan |
author_sort | Teng, Xiaoyu |
collection | PubMed |
description | Video summarization (VS) is a widely used technique for facilitating the effective reading, fast comprehension, and effective retrieval of video content. Certain properties of the new video data, such as a lack of prominent emphasis and a fuzzy theme development border, disturb the original thinking mode based on video feature information. Moreover, it introduces new challenges to the extraction of video depth and breadth features. In addition, the diversity of user requirements creates additional complications for more accurate keyframe screening issues. To overcome these challenges, this paper proposes a hierarchical spatial–temporal cross-attention scheme for video summarization based on comparative learning. Graph attention networks (GAT) and the multi-head convolutional attention cell are used to extract local and depth features, while the GAT-adjusted bidirection ConvLSTM (DB-ConvLSTM) is used to extract global and breadth features. Furthermore, a spatial–temporal cross-attention-based ConvLSTM is developed for merging hierarchical characteristics and achieving more accurate screening in similar keyframes clusters. Verification experiments and comparative analysis demonstrate that our method outperforms state-of-the-art methods. |
format | Online Article Text |
id | pubmed-9657952 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-96579522022-11-15 A Hierarchical Spatial–Temporal Cross-Attention Scheme for Video Summarization Using Contrastive Learning Teng, Xiaoyu Gui, Xiaolin Xu, Pan Tong, Jianglei An, Jian Liu, Yang Jiang, Huilan Sensors (Basel) Article Video summarization (VS) is a widely used technique for facilitating the effective reading, fast comprehension, and effective retrieval of video content. Certain properties of the new video data, such as a lack of prominent emphasis and a fuzzy theme development border, disturb the original thinking mode based on video feature information. Moreover, it introduces new challenges to the extraction of video depth and breadth features. In addition, the diversity of user requirements creates additional complications for more accurate keyframe screening issues. To overcome these challenges, this paper proposes a hierarchical spatial–temporal cross-attention scheme for video summarization based on comparative learning. Graph attention networks (GAT) and the multi-head convolutional attention cell are used to extract local and depth features, while the GAT-adjusted bidirection ConvLSTM (DB-ConvLSTM) is used to extract global and breadth features. Furthermore, a spatial–temporal cross-attention-based ConvLSTM is developed for merging hierarchical characteristics and achieving more accurate screening in similar keyframes clusters. Verification experiments and comparative analysis demonstrate that our method outperforms state-of-the-art methods. MDPI 2022-10-28 /pmc/articles/PMC9657952/ /pubmed/36365972 http://dx.doi.org/10.3390/s22218275 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Teng, Xiaoyu Gui, Xiaolin Xu, Pan Tong, Jianglei An, Jian Liu, Yang Jiang, Huilan A Hierarchical Spatial–Temporal Cross-Attention Scheme for Video Summarization Using Contrastive Learning |
title | A Hierarchical Spatial–Temporal Cross-Attention Scheme for Video Summarization Using Contrastive Learning |
title_full | A Hierarchical Spatial–Temporal Cross-Attention Scheme for Video Summarization Using Contrastive Learning |
title_fullStr | A Hierarchical Spatial–Temporal Cross-Attention Scheme for Video Summarization Using Contrastive Learning |
title_full_unstemmed | A Hierarchical Spatial–Temporal Cross-Attention Scheme for Video Summarization Using Contrastive Learning |
title_short | A Hierarchical Spatial–Temporal Cross-Attention Scheme for Video Summarization Using Contrastive Learning |
title_sort | hierarchical spatial–temporal cross-attention scheme for video summarization using contrastive learning |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9657952/ https://www.ncbi.nlm.nih.gov/pubmed/36365972 http://dx.doi.org/10.3390/s22218275 |
work_keys_str_mv | AT tengxiaoyu ahierarchicalspatialtemporalcrossattentionschemeforvideosummarizationusingcontrastivelearning AT guixiaolin ahierarchicalspatialtemporalcrossattentionschemeforvideosummarizationusingcontrastivelearning AT xupan ahierarchicalspatialtemporalcrossattentionschemeforvideosummarizationusingcontrastivelearning AT tongjianglei ahierarchicalspatialtemporalcrossattentionschemeforvideosummarizationusingcontrastivelearning AT anjian ahierarchicalspatialtemporalcrossattentionschemeforvideosummarizationusingcontrastivelearning AT liuyang ahierarchicalspatialtemporalcrossattentionschemeforvideosummarizationusingcontrastivelearning AT jianghuilan ahierarchicalspatialtemporalcrossattentionschemeforvideosummarizationusingcontrastivelearning AT tengxiaoyu hierarchicalspatialtemporalcrossattentionschemeforvideosummarizationusingcontrastivelearning AT guixiaolin hierarchicalspatialtemporalcrossattentionschemeforvideosummarizationusingcontrastivelearning AT xupan hierarchicalspatialtemporalcrossattentionschemeforvideosummarizationusingcontrastivelearning AT tongjianglei hierarchicalspatialtemporalcrossattentionschemeforvideosummarizationusingcontrastivelearning AT anjian hierarchicalspatialtemporalcrossattentionschemeforvideosummarizationusingcontrastivelearning AT liuyang hierarchicalspatialtemporalcrossattentionschemeforvideosummarizationusingcontrastivelearning AT jianghuilan hierarchicalspatialtemporalcrossattentionschemeforvideosummarizationusingcontrastivelearning |