Cargando…

What the Appearance Channel from Two-Stream Architectures for Activity Recognition Is Learning?

The automatic recognition of human activities from video data is being led by spatio-temporal Convolutional Neural Networks (3D CNNs), in particular two-stream architectures such as I3D that reports the best accuracy so far. Despite the high performance in accuracy of this kind of architectures, ver...

Descripción completa

Detalles Bibliográficos
Autores principales:	Oves García, Reinier, Sucar, L. Enrique
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7297582/ http://dx.doi.org/10.1007/978-3-030-49076-8_24

_version_	1783547036583329792
author	Oves García, Reinier Sucar, L. Enrique
author_facet	Oves García, Reinier Sucar, L. Enrique
author_sort	Oves García, Reinier
collection	PubMed
description	The automatic recognition of human activities from video data is being led by spatio-temporal Convolutional Neural Networks (3D CNNs), in particular two-stream architectures such as I3D that reports the best accuracy so far. Despite the high performance in accuracy of this kind of architectures, very little is known about what they are really learning from data, resulting therefore in a lack of robustness and explainability. In this work we select the appearance channel from the I3D architecture and create a set of experiments aimed at explaining what this model is learning. Throughout the proposed experiments we provide evidence that this particular model is learning the texture of the largest area (which can be the activity or the background, depending on the distance from the camera to the action performed). In addition, we state several considerations to take into account when selecting the training data to achieve a better generalization of the model for human activity recognition.
format	Online Article Text
id	pubmed-7297582
institution	National Center for Biotechnology Information
language	English
publishDate	2020
record_format	MEDLINE/PubMed
spelling	pubmed-72975822020-06-17 What the Appearance Channel from Two-Stream Architectures for Activity Recognition Is Learning? Oves García, Reinier Sucar, L. Enrique Pattern Recognition Article The automatic recognition of human activities from video data is being led by spatio-temporal Convolutional Neural Networks (3D CNNs), in particular two-stream architectures such as I3D that reports the best accuracy so far. Despite the high performance in accuracy of this kind of architectures, very little is known about what they are really learning from data, resulting therefore in a lack of robustness and explainability. In this work we select the appearance channel from the I3D architecture and create a set of experiments aimed at explaining what this model is learning. Throughout the proposed experiments we provide evidence that this particular model is learning the texture of the largest area (which can be the activity or the background, depending on the distance from the camera to the action performed). In addition, we state several considerations to take into account when selecting the training data to achieve a better generalization of the model for human activity recognition. 2020-04-29 /pmc/articles/PMC7297582/ http://dx.doi.org/10.1007/978-3-030-49076-8_24 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Article Oves García, Reinier Sucar, L. Enrique What the Appearance Channel from Two-Stream Architectures for Activity Recognition Is Learning?
title	What the Appearance Channel from Two-Stream Architectures for Activity Recognition Is Learning?
title_full	What the Appearance Channel from Two-Stream Architectures for Activity Recognition Is Learning?
title_fullStr	What the Appearance Channel from Two-Stream Architectures for Activity Recognition Is Learning?
title_full_unstemmed	What the Appearance Channel from Two-Stream Architectures for Activity Recognition Is Learning?
title_short	What the Appearance Channel from Two-Stream Architectures for Activity Recognition Is Learning?
title_sort	what the appearance channel from two-stream architectures for activity recognition is learning?
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7297582/ http://dx.doi.org/10.1007/978-3-030-49076-8_24
work_keys_str_mv	AT ovesgarciareinier whattheappearancechannelfromtwostreamarchitecturesforactivityrecognitionislearning AT sucarlenrique whattheappearancechannelfromtwostreamarchitecturesforactivityrecognitionislearning

What the Appearance Channel from Two-Stream Architectures for Activity Recognition Is Learning?

Ejemplares similares