Cargando…

Sound Can Help Us See More Clearly

In the field of video action classification, existing network frameworks often only use video frames as input. When the object involved in the action does not appear in a prominent position in the video frame, the network cannot accurately classify it. We introduce a new neural network structure tha...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Yongsheng, Tu, Tengfei, Zhang, Hua, Li, Jishuai, Jin, Zhengping, Wen, Qiaoyan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8778024/ https://www.ncbi.nlm.nih.gov/pubmed/35062558 http://dx.doi.org/10.3390/s22020599

_version_	1784637216975224832
author	Li, Yongsheng Tu, Tengfei Zhang, Hua Li, Jishuai Jin, Zhengping Wen, Qiaoyan
author_facet	Li, Yongsheng Tu, Tengfei Zhang, Hua Li, Jishuai Jin, Zhengping Wen, Qiaoyan
author_sort	Li, Yongsheng
collection	PubMed
description	In the field of video action classification, existing network frameworks often only use video frames as input. When the object involved in the action does not appear in a prominent position in the video frame, the network cannot accurately classify it. We introduce a new neural network structure that uses sound to assist in processing such tasks. The original sound wave is converted into sound texture as the input of the network. Furthermore, in order to use the rich modal information (images and sound) in the video, we designed and used a two-stream frame. In this work, we assume that sound data can be used to solve motion recognition tasks. To demonstrate this, we designed a neural network based on sound texture to perform video action classification tasks. Then, we fuse this network with a deep neural network that uses continuous video frames to construct a two-stream network, which is called A-IN. Finally, in the kinetics dataset, we use our proposed A-IN to compare with the image-only network. The experimental results show that the recognition accuracy of the two-stream neural network model with uesed sound data features is increased by 7.6% compared with the network using video frames. This proves that the rational use of the rich information in the video can improve the classification effect.
format	Online Article Text
id	pubmed-8778024
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-87780242022-01-22 Sound Can Help Us See More Clearly Li, Yongsheng Tu, Tengfei Zhang, Hua Li, Jishuai Jin, Zhengping Wen, Qiaoyan Sensors (Basel) Article In the field of video action classification, existing network frameworks often only use video frames as input. When the object involved in the action does not appear in a prominent position in the video frame, the network cannot accurately classify it. We introduce a new neural network structure that uses sound to assist in processing such tasks. The original sound wave is converted into sound texture as the input of the network. Furthermore, in order to use the rich modal information (images and sound) in the video, we designed and used a two-stream frame. In this work, we assume that sound data can be used to solve motion recognition tasks. To demonstrate this, we designed a neural network based on sound texture to perform video action classification tasks. Then, we fuse this network with a deep neural network that uses continuous video frames to construct a two-stream network, which is called A-IN. Finally, in the kinetics dataset, we use our proposed A-IN to compare with the image-only network. The experimental results show that the recognition accuracy of the two-stream neural network model with uesed sound data features is increased by 7.6% compared with the network using video frames. This proves that the rational use of the rich information in the video can improve the classification effect. MDPI 2022-01-13 /pmc/articles/PMC8778024/ /pubmed/35062558 http://dx.doi.org/10.3390/s22020599 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Li, Yongsheng Tu, Tengfei Zhang, Hua Li, Jishuai Jin, Zhengping Wen, Qiaoyan Sound Can Help Us See More Clearly
title	Sound Can Help Us See More Clearly
title_full	Sound Can Help Us See More Clearly
title_fullStr	Sound Can Help Us See More Clearly
title_full_unstemmed	Sound Can Help Us See More Clearly
title_short	Sound Can Help Us See More Clearly
title_sort	sound can help us see more clearly
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8778024/ https://www.ncbi.nlm.nih.gov/pubmed/35062558 http://dx.doi.org/10.3390/s22020599
work_keys_str_mv	AT liyongsheng soundcanhelpusseemoreclearly AT tutengfei soundcanhelpusseemoreclearly AT zhanghua soundcanhelpusseemoreclearly AT lijishuai soundcanhelpusseemoreclearly AT jinzhengping soundcanhelpusseemoreclearly AT wenqiaoyan soundcanhelpusseemoreclearly

Sound Can Help Us See More Clearly

Ejemplares similares