Cargando…

Unsupervised Video Anomaly Detection Based on Similarity with Predefined Text Descriptions

Research on video anomaly detection has mainly been based on video data. However, many real-world cases involve users who can conceive potential normal and abnormal situations within the anomaly detection domain. This domain knowledge can be conveniently expressed as text descriptions, such as “walk...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Jaehyun, Yoon, Seongwook, Choi, Taehyeon, Sull, Sanghoon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10385872/
https://www.ncbi.nlm.nih.gov/pubmed/37514551
http://dx.doi.org/10.3390/s23146256
_version_ 1785081519611576320
author Kim, Jaehyun
Yoon, Seongwook
Choi, Taehyeon
Sull, Sanghoon
author_facet Kim, Jaehyun
Yoon, Seongwook
Choi, Taehyeon
Sull, Sanghoon
author_sort Kim, Jaehyun
collection PubMed
description Research on video anomaly detection has mainly been based on video data. However, many real-world cases involve users who can conceive potential normal and abnormal situations within the anomaly detection domain. This domain knowledge can be conveniently expressed as text descriptions, such as “walking” or “people fighting”, which can be easily obtained, customized for specific applications, and applied to unseen abnormal videos not included in the training dataset. We explore the potential of using these text descriptions with unlabeled video datasets. We use large language models to obtain text descriptions and leverage them to detect abnormal frames by calculating the cosine similarity between the input frame and text descriptions using the CLIP visual language model. To enhance the performance, we refined the CLIP-derived cosine similarity using an unlabeled dataset and the proposed text-conditional similarity, which is a similarity measure between two vectors based on additional learnable parameters and a triplet loss. The proposed method has a simple training and inference process that avoids the computationally intensive analyses of optical flow or multiple frames. The experimental results demonstrate that the proposed method outperforms unsupervised methods by showing 8% and 13% better AUC scores for the ShanghaiTech and UCFcrime datasets, respectively. Although the proposed method shows −6% and −5% than weakly supervised methods for those datasets, in abnormal videos, the proposed method shows 17% and 5% better AUC scores, which means that the proposed method shows comparable results with weakly supervised methods that require resource-intensive dataset labeling. These outcomes validate the potential of using text descriptions in unsupervised video anomaly detection.
format Online
Article
Text
id pubmed-10385872
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-103858722023-07-30 Unsupervised Video Anomaly Detection Based on Similarity with Predefined Text Descriptions Kim, Jaehyun Yoon, Seongwook Choi, Taehyeon Sull, Sanghoon Sensors (Basel) Article Research on video anomaly detection has mainly been based on video data. However, many real-world cases involve users who can conceive potential normal and abnormal situations within the anomaly detection domain. This domain knowledge can be conveniently expressed as text descriptions, such as “walking” or “people fighting”, which can be easily obtained, customized for specific applications, and applied to unseen abnormal videos not included in the training dataset. We explore the potential of using these text descriptions with unlabeled video datasets. We use large language models to obtain text descriptions and leverage them to detect abnormal frames by calculating the cosine similarity between the input frame and text descriptions using the CLIP visual language model. To enhance the performance, we refined the CLIP-derived cosine similarity using an unlabeled dataset and the proposed text-conditional similarity, which is a similarity measure between two vectors based on additional learnable parameters and a triplet loss. The proposed method has a simple training and inference process that avoids the computationally intensive analyses of optical flow or multiple frames. The experimental results demonstrate that the proposed method outperforms unsupervised methods by showing 8% and 13% better AUC scores for the ShanghaiTech and UCFcrime datasets, respectively. Although the proposed method shows −6% and −5% than weakly supervised methods for those datasets, in abnormal videos, the proposed method shows 17% and 5% better AUC scores, which means that the proposed method shows comparable results with weakly supervised methods that require resource-intensive dataset labeling. These outcomes validate the potential of using text descriptions in unsupervised video anomaly detection. MDPI 2023-07-09 /pmc/articles/PMC10385872/ /pubmed/37514551 http://dx.doi.org/10.3390/s23146256 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Kim, Jaehyun
Yoon, Seongwook
Choi, Taehyeon
Sull, Sanghoon
Unsupervised Video Anomaly Detection Based on Similarity with Predefined Text Descriptions
title Unsupervised Video Anomaly Detection Based on Similarity with Predefined Text Descriptions
title_full Unsupervised Video Anomaly Detection Based on Similarity with Predefined Text Descriptions
title_fullStr Unsupervised Video Anomaly Detection Based on Similarity with Predefined Text Descriptions
title_full_unstemmed Unsupervised Video Anomaly Detection Based on Similarity with Predefined Text Descriptions
title_short Unsupervised Video Anomaly Detection Based on Similarity with Predefined Text Descriptions
title_sort unsupervised video anomaly detection based on similarity with predefined text descriptions
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10385872/
https://www.ncbi.nlm.nih.gov/pubmed/37514551
http://dx.doi.org/10.3390/s23146256
work_keys_str_mv AT kimjaehyun unsupervisedvideoanomalydetectionbasedonsimilaritywithpredefinedtextdescriptions
AT yoonseongwook unsupervisedvideoanomalydetectionbasedonsimilaritywithpredefinedtextdescriptions
AT choitaehyeon unsupervisedvideoanomalydetectionbasedonsimilaritywithpredefinedtextdescriptions
AT sullsanghoon unsupervisedvideoanomalydetectionbasedonsimilaritywithpredefinedtextdescriptions