Cargando…

Speech Discrimination in Real-World Group Communication Using Audio-Motion Multimodal Sensing

Speech discrimination that determines whether a participant is speaking at a given moment is essential in investigating human verbal communication. Specifically, in dynamic real-world situations where multiple people participate in, and form, groups in the same space, simultaneous speakers render sp...

Descripción completa

Detalles Bibliográficos
Autores principales:	Nozawa, Takayuki, Uchiyama, Mizuki, Honda, Keigo, Nakano, Tamio, Miyake, Yoshihiro
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7287755/ https://www.ncbi.nlm.nih.gov/pubmed/32456031 http://dx.doi.org/10.3390/s20102948

_version_	1783545122080686080
author	Nozawa, Takayuki Uchiyama, Mizuki Honda, Keigo Nakano, Tamio Miyake, Yoshihiro
author_facet	Nozawa, Takayuki Uchiyama, Mizuki Honda, Keigo Nakano, Tamio Miyake, Yoshihiro
author_sort	Nozawa, Takayuki
collection	PubMed
description	Speech discrimination that determines whether a participant is speaking at a given moment is essential in investigating human verbal communication. Specifically, in dynamic real-world situations where multiple people participate in, and form, groups in the same space, simultaneous speakers render speech discrimination that is solely based on audio sensing difficult. In this study, we focused on physical activity during speech, and hypothesized that combining audio and physical motion data acquired by wearable sensors can improve speech discrimination. Thus, utterance and physical activity data of students in a university participatory class were recorded, using smartphones worn around their neck. First, we tested the temporal relationship between manually identified utterances and physical motions and confirmed that physical activities in wide-frequency ranges co-occurred with utterances. Second, we trained and tested classifiers for each participant and found a higher performance with the audio-motion classifier (average accuracy 92.2%) than both the audio-only (80.4%) and motion-only (87.8%) classifiers. Finally, we tested inter-individual classification and obtained a higher performance with the audio-motion combined classifier (83.2%) than the audio-only (67.7%) and motion-only (71.9%) classifiers. These results show that audio-motion multimodal sensing using widely available smartphones can provide effective utterance discrimination in dynamic group communications.
format	Online Article Text
id	pubmed-7287755
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-72877552020-06-15 Speech Discrimination in Real-World Group Communication Using Audio-Motion Multimodal Sensing Nozawa, Takayuki Uchiyama, Mizuki Honda, Keigo Nakano, Tamio Miyake, Yoshihiro Sensors (Basel) Article Speech discrimination that determines whether a participant is speaking at a given moment is essential in investigating human verbal communication. Specifically, in dynamic real-world situations where multiple people participate in, and form, groups in the same space, simultaneous speakers render speech discrimination that is solely based on audio sensing difficult. In this study, we focused on physical activity during speech, and hypothesized that combining audio and physical motion data acquired by wearable sensors can improve speech discrimination. Thus, utterance and physical activity data of students in a university participatory class were recorded, using smartphones worn around their neck. First, we tested the temporal relationship between manually identified utterances and physical motions and confirmed that physical activities in wide-frequency ranges co-occurred with utterances. Second, we trained and tested classifiers for each participant and found a higher performance with the audio-motion classifier (average accuracy 92.2%) than both the audio-only (80.4%) and motion-only (87.8%) classifiers. Finally, we tested inter-individual classification and obtained a higher performance with the audio-motion combined classifier (83.2%) than the audio-only (67.7%) and motion-only (71.9%) classifiers. These results show that audio-motion multimodal sensing using widely available smartphones can provide effective utterance discrimination in dynamic group communications. MDPI 2020-05-22 /pmc/articles/PMC7287755/ /pubmed/32456031 http://dx.doi.org/10.3390/s20102948 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Nozawa, Takayuki Uchiyama, Mizuki Honda, Keigo Nakano, Tamio Miyake, Yoshihiro Speech Discrimination in Real-World Group Communication Using Audio-Motion Multimodal Sensing
title	Speech Discrimination in Real-World Group Communication Using Audio-Motion Multimodal Sensing
title_full	Speech Discrimination in Real-World Group Communication Using Audio-Motion Multimodal Sensing
title_fullStr	Speech Discrimination in Real-World Group Communication Using Audio-Motion Multimodal Sensing
title_full_unstemmed	Speech Discrimination in Real-World Group Communication Using Audio-Motion Multimodal Sensing
title_short	Speech Discrimination in Real-World Group Communication Using Audio-Motion Multimodal Sensing
title_sort	speech discrimination in real-world group communication using audio-motion multimodal sensing
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7287755/ https://www.ncbi.nlm.nih.gov/pubmed/32456031 http://dx.doi.org/10.3390/s20102948
work_keys_str_mv	AT nozawatakayuki speechdiscriminationinrealworldgroupcommunicationusingaudiomotionmultimodalsensing AT uchiyamamizuki speechdiscriminationinrealworldgroupcommunicationusingaudiomotionmultimodalsensing AT hondakeigo speechdiscriminationinrealworldgroupcommunicationusingaudiomotionmultimodalsensing AT nakanotamio speechdiscriminationinrealworldgroupcommunicationusingaudiomotionmultimodalsensing AT miyakeyoshihiro speechdiscriminationinrealworldgroupcommunicationusingaudiomotionmultimodalsensing

Speech Discrimination in Real-World Group Communication Using Audio-Motion Multimodal Sensing

Ejemplares similares