Cargando…

Real-Time Sound Source Localization for Low-Power IoT Devices Based on Multi-Stream CNN

Voice-activated artificial intelligence (AI) technology has advanced rapidly and is being adopted in various devices such as smart speakers and display products, which enable users to multitask without touching the devices. However, most devices equipped with cameras and displays lack mobility; ther...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ko, Jungbeom, Kim, Hyunchul, Kim, Jungsuk
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9230768/ https://www.ncbi.nlm.nih.gov/pubmed/35746430 http://dx.doi.org/10.3390/s22124650

_version_	1784735149439582208
author	Ko, Jungbeom Kim, Hyunchul Kim, Jungsuk
author_facet	Ko, Jungbeom Kim, Hyunchul Kim, Jungsuk
author_sort	Ko, Jungbeom
collection	PubMed
description	Voice-activated artificial intelligence (AI) technology has advanced rapidly and is being adopted in various devices such as smart speakers and display products, which enable users to multitask without touching the devices. However, most devices equipped with cameras and displays lack mobility; therefore, users cannot avoid touching them for face-to-face interactions, which contradicts the voice-activated AI philosophy. In this paper, we propose a deep neural network-based real-time sound source localization (SSL) model for low-power internet of things (IoT) devices based on microphone arrays and present a prototype implemented on actual IoT devices. The proposed SSL model delivers multi-channel acoustic data to parallel convolutional neural network layers in the form of multiple streams to capture the unique delay patterns for the low-, mid-, and high-frequency ranges, and estimates the fine and coarse location of voices. The model adapted in this study achieved an accuracy of 91.41% on fine location estimation and a direction of arrival error of 7.43° on noisy data. It achieved a processing time of 7.811 ms per 40 ms samples on the Raspberry Pi 4B. The proposed model can be applied to a camera-based humanoid robot that mimics the manner in which humans react to trigger voices in crowded environments.
format	Online Article Text
id	pubmed-9230768
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-92307682022-06-25 Real-Time Sound Source Localization for Low-Power IoT Devices Based on Multi-Stream CNN Ko, Jungbeom Kim, Hyunchul Kim, Jungsuk Sensors (Basel) Article Voice-activated artificial intelligence (AI) technology has advanced rapidly and is being adopted in various devices such as smart speakers and display products, which enable users to multitask without touching the devices. However, most devices equipped with cameras and displays lack mobility; therefore, users cannot avoid touching them for face-to-face interactions, which contradicts the voice-activated AI philosophy. In this paper, we propose a deep neural network-based real-time sound source localization (SSL) model for low-power internet of things (IoT) devices based on microphone arrays and present a prototype implemented on actual IoT devices. The proposed SSL model delivers multi-channel acoustic data to parallel convolutional neural network layers in the form of multiple streams to capture the unique delay patterns for the low-, mid-, and high-frequency ranges, and estimates the fine and coarse location of voices. The model adapted in this study achieved an accuracy of 91.41% on fine location estimation and a direction of arrival error of 7.43° on noisy data. It achieved a processing time of 7.811 ms per 40 ms samples on the Raspberry Pi 4B. The proposed model can be applied to a camera-based humanoid robot that mimics the manner in which humans react to trigger voices in crowded environments. MDPI 2022-06-20 /pmc/articles/PMC9230768/ /pubmed/35746430 http://dx.doi.org/10.3390/s22124650 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Ko, Jungbeom Kim, Hyunchul Kim, Jungsuk Real-Time Sound Source Localization for Low-Power IoT Devices Based on Multi-Stream CNN
title	Real-Time Sound Source Localization for Low-Power IoT Devices Based on Multi-Stream CNN
title_full	Real-Time Sound Source Localization for Low-Power IoT Devices Based on Multi-Stream CNN
title_fullStr	Real-Time Sound Source Localization for Low-Power IoT Devices Based on Multi-Stream CNN
title_full_unstemmed	Real-Time Sound Source Localization for Low-Power IoT Devices Based on Multi-Stream CNN
title_short	Real-Time Sound Source Localization for Low-Power IoT Devices Based on Multi-Stream CNN
title_sort	real-time sound source localization for low-power iot devices based on multi-stream cnn
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9230768/ https://www.ncbi.nlm.nih.gov/pubmed/35746430 http://dx.doi.org/10.3390/s22124650
work_keys_str_mv	AT kojungbeom realtimesoundsourcelocalizationforlowpoweriotdevicesbasedonmultistreamcnn AT kimhyunchul realtimesoundsourcelocalizationforlowpoweriotdevicesbasedonmultistreamcnn AT kimjungsuk realtimesoundsourcelocalizationforlowpoweriotdevicesbasedonmultistreamcnn

Real-Time Sound Source Localization for Low-Power IoT Devices Based on Multi-Stream CNN

Ejemplares similares