Cargando…
Video Scene Detection Using Transformer Encoding Linker Network (TELNet)
This paper introduces a transformer encoding linker network (TELNet) for automatically identifying scene boundaries in videos without prior knowledge of their structure. Videos consist of sequences of semantically related shots or chapters, and recognizing scene boundaries is crucial for various vid...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10458897/ https://www.ncbi.nlm.nih.gov/pubmed/37631590 http://dx.doi.org/10.3390/s23167050 |
_version_ | 1785097276738240512 |
---|---|
author | Tseng, Shu-Ming Yeh, Zhi-Ting Wu, Chia-Yang Chang, Jia-Bin Norouzi, Mehdi |
author_facet | Tseng, Shu-Ming Yeh, Zhi-Ting Wu, Chia-Yang Chang, Jia-Bin Norouzi, Mehdi |
author_sort | Tseng, Shu-Ming |
collection | PubMed |
description | This paper introduces a transformer encoding linker network (TELNet) for automatically identifying scene boundaries in videos without prior knowledge of their structure. Videos consist of sequences of semantically related shots or chapters, and recognizing scene boundaries is crucial for various video processing tasks, including video summarization. TELNet utilizes a rolling window to scan through video shots, encoding their features extracted from a fine-tuned 3D CNN model (transformer encoder). By establishing links between video shots based on these encoded features (linker), TELNet efficiently identifies scene boundaries where consecutive shots lack links. TELNet was trained on multiple video scene detection datasets and demonstrated results comparable to other state-of-the-art models in standard settings. Notably, in cross-dataset evaluations, TELNet demonstrated significantly improved results (F-score). Furthermore, TELNet’s computational complexity grows linearly with the number of shots, making it highly efficient in processing long videos. |
format | Online Article Text |
id | pubmed-10458897 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-104588972023-08-27 Video Scene Detection Using Transformer Encoding Linker Network (TELNet) Tseng, Shu-Ming Yeh, Zhi-Ting Wu, Chia-Yang Chang, Jia-Bin Norouzi, Mehdi Sensors (Basel) Communication This paper introduces a transformer encoding linker network (TELNet) for automatically identifying scene boundaries in videos without prior knowledge of their structure. Videos consist of sequences of semantically related shots or chapters, and recognizing scene boundaries is crucial for various video processing tasks, including video summarization. TELNet utilizes a rolling window to scan through video shots, encoding their features extracted from a fine-tuned 3D CNN model (transformer encoder). By establishing links between video shots based on these encoded features (linker), TELNet efficiently identifies scene boundaries where consecutive shots lack links. TELNet was trained on multiple video scene detection datasets and demonstrated results comparable to other state-of-the-art models in standard settings. Notably, in cross-dataset evaluations, TELNet demonstrated significantly improved results (F-score). Furthermore, TELNet’s computational complexity grows linearly with the number of shots, making it highly efficient in processing long videos. MDPI 2023-08-09 /pmc/articles/PMC10458897/ /pubmed/37631590 http://dx.doi.org/10.3390/s23167050 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Communication Tseng, Shu-Ming Yeh, Zhi-Ting Wu, Chia-Yang Chang, Jia-Bin Norouzi, Mehdi Video Scene Detection Using Transformer Encoding Linker Network (TELNet) |
title | Video Scene Detection Using Transformer Encoding Linker Network (TELNet) |
title_full | Video Scene Detection Using Transformer Encoding Linker Network (TELNet) |
title_fullStr | Video Scene Detection Using Transformer Encoding Linker Network (TELNet) |
title_full_unstemmed | Video Scene Detection Using Transformer Encoding Linker Network (TELNet) |
title_short | Video Scene Detection Using Transformer Encoding Linker Network (TELNet) |
title_sort | video scene detection using transformer encoding linker network (telnet) |
topic | Communication |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10458897/ https://www.ncbi.nlm.nih.gov/pubmed/37631590 http://dx.doi.org/10.3390/s23167050 |
work_keys_str_mv | AT tsengshuming videoscenedetectionusingtransformerencodinglinkernetworktelnet AT yehzhiting videoscenedetectionusingtransformerencodinglinkernetworktelnet AT wuchiayang videoscenedetectionusingtransformerencodinglinkernetworktelnet AT changjiabin videoscenedetectionusingtransformerencodinglinkernetworktelnet AT norouzimehdi videoscenedetectionusingtransformerencodinglinkernetworktelnet |