Cargando…
Unsupervised Video Anomaly Detection Based on Similarity with Predefined Text Descriptions
Research on video anomaly detection has mainly been based on video data. However, many real-world cases involve users who can conceive potential normal and abnormal situations within the anomaly detection domain. This domain knowledge can be conveniently expressed as text descriptions, such as “walk...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10385872/ https://www.ncbi.nlm.nih.gov/pubmed/37514551 http://dx.doi.org/10.3390/s23146256 |
_version_ | 1785081519611576320 |
---|---|
author | Kim, Jaehyun Yoon, Seongwook Choi, Taehyeon Sull, Sanghoon |
author_facet | Kim, Jaehyun Yoon, Seongwook Choi, Taehyeon Sull, Sanghoon |
author_sort | Kim, Jaehyun |
collection | PubMed |
description | Research on video anomaly detection has mainly been based on video data. However, many real-world cases involve users who can conceive potential normal and abnormal situations within the anomaly detection domain. This domain knowledge can be conveniently expressed as text descriptions, such as “walking” or “people fighting”, which can be easily obtained, customized for specific applications, and applied to unseen abnormal videos not included in the training dataset. We explore the potential of using these text descriptions with unlabeled video datasets. We use large language models to obtain text descriptions and leverage them to detect abnormal frames by calculating the cosine similarity between the input frame and text descriptions using the CLIP visual language model. To enhance the performance, we refined the CLIP-derived cosine similarity using an unlabeled dataset and the proposed text-conditional similarity, which is a similarity measure between two vectors based on additional learnable parameters and a triplet loss. The proposed method has a simple training and inference process that avoids the computationally intensive analyses of optical flow or multiple frames. The experimental results demonstrate that the proposed method outperforms unsupervised methods by showing 8% and 13% better AUC scores for the ShanghaiTech and UCFcrime datasets, respectively. Although the proposed method shows −6% and −5% than weakly supervised methods for those datasets, in abnormal videos, the proposed method shows 17% and 5% better AUC scores, which means that the proposed method shows comparable results with weakly supervised methods that require resource-intensive dataset labeling. These outcomes validate the potential of using text descriptions in unsupervised video anomaly detection. |
format | Online Article Text |
id | pubmed-10385872 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-103858722023-07-30 Unsupervised Video Anomaly Detection Based on Similarity with Predefined Text Descriptions Kim, Jaehyun Yoon, Seongwook Choi, Taehyeon Sull, Sanghoon Sensors (Basel) Article Research on video anomaly detection has mainly been based on video data. However, many real-world cases involve users who can conceive potential normal and abnormal situations within the anomaly detection domain. This domain knowledge can be conveniently expressed as text descriptions, such as “walking” or “people fighting”, which can be easily obtained, customized for specific applications, and applied to unseen abnormal videos not included in the training dataset. We explore the potential of using these text descriptions with unlabeled video datasets. We use large language models to obtain text descriptions and leverage them to detect abnormal frames by calculating the cosine similarity between the input frame and text descriptions using the CLIP visual language model. To enhance the performance, we refined the CLIP-derived cosine similarity using an unlabeled dataset and the proposed text-conditional similarity, which is a similarity measure between two vectors based on additional learnable parameters and a triplet loss. The proposed method has a simple training and inference process that avoids the computationally intensive analyses of optical flow or multiple frames. The experimental results demonstrate that the proposed method outperforms unsupervised methods by showing 8% and 13% better AUC scores for the ShanghaiTech and UCFcrime datasets, respectively. Although the proposed method shows −6% and −5% than weakly supervised methods for those datasets, in abnormal videos, the proposed method shows 17% and 5% better AUC scores, which means that the proposed method shows comparable results with weakly supervised methods that require resource-intensive dataset labeling. These outcomes validate the potential of using text descriptions in unsupervised video anomaly detection. MDPI 2023-07-09 /pmc/articles/PMC10385872/ /pubmed/37514551 http://dx.doi.org/10.3390/s23146256 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Kim, Jaehyun Yoon, Seongwook Choi, Taehyeon Sull, Sanghoon Unsupervised Video Anomaly Detection Based on Similarity with Predefined Text Descriptions |
title | Unsupervised Video Anomaly Detection Based on Similarity with Predefined Text Descriptions |
title_full | Unsupervised Video Anomaly Detection Based on Similarity with Predefined Text Descriptions |
title_fullStr | Unsupervised Video Anomaly Detection Based on Similarity with Predefined Text Descriptions |
title_full_unstemmed | Unsupervised Video Anomaly Detection Based on Similarity with Predefined Text Descriptions |
title_short | Unsupervised Video Anomaly Detection Based on Similarity with Predefined Text Descriptions |
title_sort | unsupervised video anomaly detection based on similarity with predefined text descriptions |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10385872/ https://www.ncbi.nlm.nih.gov/pubmed/37514551 http://dx.doi.org/10.3390/s23146256 |
work_keys_str_mv | AT kimjaehyun unsupervisedvideoanomalydetectionbasedonsimilaritywithpredefinedtextdescriptions AT yoonseongwook unsupervisedvideoanomalydetectionbasedonsimilaritywithpredefinedtextdescriptions AT choitaehyeon unsupervisedvideoanomalydetectionbasedonsimilaritywithpredefinedtextdescriptions AT sullsanghoon unsupervisedvideoanomalydetectionbasedonsimilaritywithpredefinedtextdescriptions |