Cargando…

Video Scene Detection Using Transformer Encoding Linker Network (TELNet)

This paper introduces a transformer encoding linker network (TELNet) for automatically identifying scene boundaries in videos without prior knowledge of their structure. Videos consist of sequences of semantically related shots or chapters, and recognizing scene boundaries is crucial for various vid...

Descripción completa

Detalles Bibliográficos
Autores principales: Tseng, Shu-Ming, Yeh, Zhi-Ting, Wu, Chia-Yang, Chang, Jia-Bin, Norouzi, Mehdi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10458897/
https://www.ncbi.nlm.nih.gov/pubmed/37631590
http://dx.doi.org/10.3390/s23167050
_version_ 1785097276738240512
author Tseng, Shu-Ming
Yeh, Zhi-Ting
Wu, Chia-Yang
Chang, Jia-Bin
Norouzi, Mehdi
author_facet Tseng, Shu-Ming
Yeh, Zhi-Ting
Wu, Chia-Yang
Chang, Jia-Bin
Norouzi, Mehdi
author_sort Tseng, Shu-Ming
collection PubMed
description This paper introduces a transformer encoding linker network (TELNet) for automatically identifying scene boundaries in videos without prior knowledge of their structure. Videos consist of sequences of semantically related shots or chapters, and recognizing scene boundaries is crucial for various video processing tasks, including video summarization. TELNet utilizes a rolling window to scan through video shots, encoding their features extracted from a fine-tuned 3D CNN model (transformer encoder). By establishing links between video shots based on these encoded features (linker), TELNet efficiently identifies scene boundaries where consecutive shots lack links. TELNet was trained on multiple video scene detection datasets and demonstrated results comparable to other state-of-the-art models in standard settings. Notably, in cross-dataset evaluations, TELNet demonstrated significantly improved results (F-score). Furthermore, TELNet’s computational complexity grows linearly with the number of shots, making it highly efficient in processing long videos.
format Online
Article
Text
id pubmed-10458897
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-104588972023-08-27 Video Scene Detection Using Transformer Encoding Linker Network (TELNet) Tseng, Shu-Ming Yeh, Zhi-Ting Wu, Chia-Yang Chang, Jia-Bin Norouzi, Mehdi Sensors (Basel) Communication This paper introduces a transformer encoding linker network (TELNet) for automatically identifying scene boundaries in videos without prior knowledge of their structure. Videos consist of sequences of semantically related shots or chapters, and recognizing scene boundaries is crucial for various video processing tasks, including video summarization. TELNet utilizes a rolling window to scan through video shots, encoding their features extracted from a fine-tuned 3D CNN model (transformer encoder). By establishing links between video shots based on these encoded features (linker), TELNet efficiently identifies scene boundaries where consecutive shots lack links. TELNet was trained on multiple video scene detection datasets and demonstrated results comparable to other state-of-the-art models in standard settings. Notably, in cross-dataset evaluations, TELNet demonstrated significantly improved results (F-score). Furthermore, TELNet’s computational complexity grows linearly with the number of shots, making it highly efficient in processing long videos. MDPI 2023-08-09 /pmc/articles/PMC10458897/ /pubmed/37631590 http://dx.doi.org/10.3390/s23167050 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Communication
Tseng, Shu-Ming
Yeh, Zhi-Ting
Wu, Chia-Yang
Chang, Jia-Bin
Norouzi, Mehdi
Video Scene Detection Using Transformer Encoding Linker Network (TELNet)
title Video Scene Detection Using Transformer Encoding Linker Network (TELNet)
title_full Video Scene Detection Using Transformer Encoding Linker Network (TELNet)
title_fullStr Video Scene Detection Using Transformer Encoding Linker Network (TELNet)
title_full_unstemmed Video Scene Detection Using Transformer Encoding Linker Network (TELNet)
title_short Video Scene Detection Using Transformer Encoding Linker Network (TELNet)
title_sort video scene detection using transformer encoding linker network (telnet)
topic Communication
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10458897/
https://www.ncbi.nlm.nih.gov/pubmed/37631590
http://dx.doi.org/10.3390/s23167050
work_keys_str_mv AT tsengshuming videoscenedetectionusingtransformerencodinglinkernetworktelnet
AT yehzhiting videoscenedetectionusingtransformerencodinglinkernetworktelnet
AT wuchiayang videoscenedetectionusingtransformerencodinglinkernetworktelnet
AT changjiabin videoscenedetectionusingtransformerencodinglinkernetworktelnet
AT norouzimehdi videoscenedetectionusingtransformerencodinglinkernetworktelnet