Cargando…

Self-Supervised Learning to Detect Key Frames in Videos

Detecting key frames in videos is a common problem in many applications such as video classification, action recognition and video summarization. These tasks can be performed more efficiently using only a handful of key frames rather than the full video. Existing key frame detection approaches are m...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yan, Xiang, Gilani, Syed Zulqarnain, Feng, Mingtao, Zhang, Liang, Qin, Hanlin, Mian, Ajmal
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7731244/ https://www.ncbi.nlm.nih.gov/pubmed/33291759 http://dx.doi.org/10.3390/s20236941

_version_	1783621861490294784
author	Yan, Xiang Gilani, Syed Zulqarnain Feng, Mingtao Zhang, Liang Qin, Hanlin Mian, Ajmal
author_facet	Yan, Xiang Gilani, Syed Zulqarnain Feng, Mingtao Zhang, Liang Qin, Hanlin Mian, Ajmal
author_sort	Yan, Xiang
collection	PubMed
description	Detecting key frames in videos is a common problem in many applications such as video classification, action recognition and video summarization. These tasks can be performed more efficiently using only a handful of key frames rather than the full video. Existing key frame detection approaches are mostly designed for supervised learning and require manual labelling of key frames in a large corpus of training data to train the models. Labelling requires human annotators from different backgrounds to annotate key frames in videos which is not only expensive and time consuming but also prone to subjective errors and inconsistencies between the labelers. To overcome these problems, we propose an automatic self-supervised method for detecting key frames in a video. Our method comprises a two-stream ConvNet and a novel automatic annotation architecture able to reliably annotate key frames in a video for self-supervised learning of the ConvNet. The proposed ConvNet learns deep appearance and motion features to detect frames that are unique. The trained network is then able to detect key frames in test videos. Extensive experiments on UCF101 human action and video summarization VSUMM datasets demonstrates the effectiveness of our proposed method.
format	Online Article Text
id	pubmed-7731244
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-77312442020-12-12 Self-Supervised Learning to Detect Key Frames in Videos Yan, Xiang Gilani, Syed Zulqarnain Feng, Mingtao Zhang, Liang Qin, Hanlin Mian, Ajmal Sensors (Basel) Article Detecting key frames in videos is a common problem in many applications such as video classification, action recognition and video summarization. These tasks can be performed more efficiently using only a handful of key frames rather than the full video. Existing key frame detection approaches are mostly designed for supervised learning and require manual labelling of key frames in a large corpus of training data to train the models. Labelling requires human annotators from different backgrounds to annotate key frames in videos which is not only expensive and time consuming but also prone to subjective errors and inconsistencies between the labelers. To overcome these problems, we propose an automatic self-supervised method for detecting key frames in a video. Our method comprises a two-stream ConvNet and a novel automatic annotation architecture able to reliably annotate key frames in a video for self-supervised learning of the ConvNet. The proposed ConvNet learns deep appearance and motion features to detect frames that are unique. The trained network is then able to detect key frames in test videos. Extensive experiments on UCF101 human action and video summarization VSUMM datasets demonstrates the effectiveness of our proposed method. MDPI 2020-12-04 /pmc/articles/PMC7731244/ /pubmed/33291759 http://dx.doi.org/10.3390/s20236941 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Yan, Xiang Gilani, Syed Zulqarnain Feng, Mingtao Zhang, Liang Qin, Hanlin Mian, Ajmal Self-Supervised Learning to Detect Key Frames in Videos
title	Self-Supervised Learning to Detect Key Frames in Videos
title_full	Self-Supervised Learning to Detect Key Frames in Videos
title_fullStr	Self-Supervised Learning to Detect Key Frames in Videos
title_full_unstemmed	Self-Supervised Learning to Detect Key Frames in Videos
title_short	Self-Supervised Learning to Detect Key Frames in Videos
title_sort	self-supervised learning to detect key frames in videos
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7731244/ https://www.ncbi.nlm.nih.gov/pubmed/33291759 http://dx.doi.org/10.3390/s20236941
work_keys_str_mv	AT yanxiang selfsupervisedlearningtodetectkeyframesinvideos AT gilanisyedzulqarnain selfsupervisedlearningtodetectkeyframesinvideos AT fengmingtao selfsupervisedlearningtodetectkeyframesinvideos AT zhangliang selfsupervisedlearningtodetectkeyframesinvideos AT qinhanlin selfsupervisedlearningtodetectkeyframesinvideos AT mianajmal selfsupervisedlearningtodetectkeyframesinvideos

Self-Supervised Learning to Detect Key Frames in Videos

Ejemplares similares