Cargando…

Unsupervised Video Anomaly Detection Based on Similarity with Predefined Text Descriptions

Research on video anomaly detection has mainly been based on video data. However, many real-world cases involve users who can conceive potential normal and abnormal situations within the anomaly detection domain. This domain knowledge can be conveniently expressed as text descriptions, such as “walk...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kim, Jaehyun, Yoon, Seongwook, Choi, Taehyeon, Sull, Sanghoon
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10385872/ https://www.ncbi.nlm.nih.gov/pubmed/37514551 http://dx.doi.org/10.3390/s23146256

_version_	1785081519611576320
author	Kim, Jaehyun Yoon, Seongwook Choi, Taehyeon Sull, Sanghoon
author_facet	Kim, Jaehyun Yoon, Seongwook Choi, Taehyeon Sull, Sanghoon
author_sort	Kim, Jaehyun
collection	PubMed
description	Research on video anomaly detection has mainly been based on video data. However, many real-world cases involve users who can conceive potential normal and abnormal situations within the anomaly detection domain. This domain knowledge can be conveniently expressed as text descriptions, such as “walking” or “people fighting”, which can be easily obtained, customized for specific applications, and applied to unseen abnormal videos not included in the training dataset. We explore the potential of using these text descriptions with unlabeled video datasets. We use large language models to obtain text descriptions and leverage them to detect abnormal frames by calculating the cosine similarity between the input frame and text descriptions using the CLIP visual language model. To enhance the performance, we refined the CLIP-derived cosine similarity using an unlabeled dataset and the proposed text-conditional similarity, which is a similarity measure between two vectors based on additional learnable parameters and a triplet loss. The proposed method has a simple training and inference process that avoids the computationally intensive analyses of optical flow or multiple frames. The experimental results demonstrate that the proposed method outperforms unsupervised methods by showing 8% and 13% better AUC scores for the ShanghaiTech and UCFcrime datasets, respectively. Although the proposed method shows −6% and −5% than weakly supervised methods for those datasets, in abnormal videos, the proposed method shows 17% and 5% better AUC scores, which means that the proposed method shows comparable results with weakly supervised methods that require resource-intensive dataset labeling. These outcomes validate the potential of using text descriptions in unsupervised video anomaly detection.
format	Online Article Text
id	pubmed-10385872
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-103858722023-07-30 Unsupervised Video Anomaly Detection Based on Similarity with Predefined Text Descriptions Kim, Jaehyun Yoon, Seongwook Choi, Taehyeon Sull, Sanghoon Sensors (Basel) Article Research on video anomaly detection has mainly been based on video data. However, many real-world cases involve users who can conceive potential normal and abnormal situations within the anomaly detection domain. This domain knowledge can be conveniently expressed as text descriptions, such as “walking” or “people fighting”, which can be easily obtained, customized for specific applications, and applied to unseen abnormal videos not included in the training dataset. We explore the potential of using these text descriptions with unlabeled video datasets. We use large language models to obtain text descriptions and leverage them to detect abnormal frames by calculating the cosine similarity between the input frame and text descriptions using the CLIP visual language model. To enhance the performance, we refined the CLIP-derived cosine similarity using an unlabeled dataset and the proposed text-conditional similarity, which is a similarity measure between two vectors based on additional learnable parameters and a triplet loss. The proposed method has a simple training and inference process that avoids the computationally intensive analyses of optical flow or multiple frames. The experimental results demonstrate that the proposed method outperforms unsupervised methods by showing 8% and 13% better AUC scores for the ShanghaiTech and UCFcrime datasets, respectively. Although the proposed method shows −6% and −5% than weakly supervised methods for those datasets, in abnormal videos, the proposed method shows 17% and 5% better AUC scores, which means that the proposed method shows comparable results with weakly supervised methods that require resource-intensive dataset labeling. These outcomes validate the potential of using text descriptions in unsupervised video anomaly detection. MDPI 2023-07-09 /pmc/articles/PMC10385872/ /pubmed/37514551 http://dx.doi.org/10.3390/s23146256 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Kim, Jaehyun Yoon, Seongwook Choi, Taehyeon Sull, Sanghoon Unsupervised Video Anomaly Detection Based on Similarity with Predefined Text Descriptions
title	Unsupervised Video Anomaly Detection Based on Similarity with Predefined Text Descriptions
title_full	Unsupervised Video Anomaly Detection Based on Similarity with Predefined Text Descriptions
title_fullStr	Unsupervised Video Anomaly Detection Based on Similarity with Predefined Text Descriptions
title_full_unstemmed	Unsupervised Video Anomaly Detection Based on Similarity with Predefined Text Descriptions
title_short	Unsupervised Video Anomaly Detection Based on Similarity with Predefined Text Descriptions
title_sort	unsupervised video anomaly detection based on similarity with predefined text descriptions
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10385872/ https://www.ncbi.nlm.nih.gov/pubmed/37514551 http://dx.doi.org/10.3390/s23146256
work_keys_str_mv	AT kimjaehyun unsupervisedvideoanomalydetectionbasedonsimilaritywithpredefinedtextdescriptions AT yoonseongwook unsupervisedvideoanomalydetectionbasedonsimilaritywithpredefinedtextdescriptions AT choitaehyeon unsupervisedvideoanomalydetectionbasedonsimilaritywithpredefinedtextdescriptions AT sullsanghoon unsupervisedvideoanomalydetectionbasedonsimilaritywithpredefinedtextdescriptions

Unsupervised Video Anomaly Detection Based on Similarity with Predefined Text Descriptions

Ejemplares similares