Cargando…

Weakly Supervised Violence Detection in Surveillance Video

Automatic violence detection in video surveillance is essential for social and personal security. Monitoring the large number of surveillance cameras used in public and private areas is challenging for human operators. The manual nature of this task significantly increases the possibility of ignorin...

Descripción completa

Detalles Bibliográficos
Autores principales: Choqueluque-Roman, David, Camara-Chavez, Guillermo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9231349/
https://www.ncbi.nlm.nih.gov/pubmed/35746286
http://dx.doi.org/10.3390/s22124502
_version_ 1784735316700037120
author Choqueluque-Roman, David
Camara-Chavez, Guillermo
author_facet Choqueluque-Roman, David
Camara-Chavez, Guillermo
author_sort Choqueluque-Roman, David
collection PubMed
description Automatic violence detection in video surveillance is essential for social and personal security. Monitoring the large number of surveillance cameras used in public and private areas is challenging for human operators. The manual nature of this task significantly increases the possibility of ignoring important events due to human limitations when paying attention to multiple targets at a time. Researchers have proposed several methods to detect violent events automatically to overcome this problem. So far, most previous studies have focused only on classifying short clips without performing spatial localization. In this work, we tackle this problem by proposing a weakly supervised method to detect spatially and temporarily violent actions in surveillance videos using only video-level labels. The proposed method follows a Fast-RCNN style architecture, that has been temporally extended. First, we generate spatiotemporal proposals (action tubes) leveraging pre-trained person detectors, motion appearance (dynamic images), and tracking algorithms. Then, given an input video and the action proposals, we extract spatiotemporal features using deep neural networks. Finally, a classifier based on multiple-instance learning is trained to label each action tube as violent or non-violent. We obtain similar results to the state of the art in three public databases Hockey Fight, RLVSD, and RWF-2000, achieving an accuracy of 97.3%, 92.88%, 88.7%, respectively.
format Online
Article
Text
id pubmed-9231349
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-92313492022-06-25 Weakly Supervised Violence Detection in Surveillance Video Choqueluque-Roman, David Camara-Chavez, Guillermo Sensors (Basel) Article Automatic violence detection in video surveillance is essential for social and personal security. Monitoring the large number of surveillance cameras used in public and private areas is challenging for human operators. The manual nature of this task significantly increases the possibility of ignoring important events due to human limitations when paying attention to multiple targets at a time. Researchers have proposed several methods to detect violent events automatically to overcome this problem. So far, most previous studies have focused only on classifying short clips without performing spatial localization. In this work, we tackle this problem by proposing a weakly supervised method to detect spatially and temporarily violent actions in surveillance videos using only video-level labels. The proposed method follows a Fast-RCNN style architecture, that has been temporally extended. First, we generate spatiotemporal proposals (action tubes) leveraging pre-trained person detectors, motion appearance (dynamic images), and tracking algorithms. Then, given an input video and the action proposals, we extract spatiotemporal features using deep neural networks. Finally, a classifier based on multiple-instance learning is trained to label each action tube as violent or non-violent. We obtain similar results to the state of the art in three public databases Hockey Fight, RLVSD, and RWF-2000, achieving an accuracy of 97.3%, 92.88%, 88.7%, respectively. MDPI 2022-06-14 /pmc/articles/PMC9231349/ /pubmed/35746286 http://dx.doi.org/10.3390/s22124502 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Choqueluque-Roman, David
Camara-Chavez, Guillermo
Weakly Supervised Violence Detection in Surveillance Video
title Weakly Supervised Violence Detection in Surveillance Video
title_full Weakly Supervised Violence Detection in Surveillance Video
title_fullStr Weakly Supervised Violence Detection in Surveillance Video
title_full_unstemmed Weakly Supervised Violence Detection in Surveillance Video
title_short Weakly Supervised Violence Detection in Surveillance Video
title_sort weakly supervised violence detection in surveillance video
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9231349/
https://www.ncbi.nlm.nih.gov/pubmed/35746286
http://dx.doi.org/10.3390/s22124502
work_keys_str_mv AT choqueluqueromandavid weaklysupervisedviolencedetectioninsurveillancevideo
AT camarachavezguillermo weaklysupervisedviolencedetectioninsurveillancevideo