Cargando…
Speech Discrimination in Real-World Group Communication Using Audio-Motion Multimodal Sensing
Speech discrimination that determines whether a participant is speaking at a given moment is essential in investigating human verbal communication. Specifically, in dynamic real-world situations where multiple people participate in, and form, groups in the same space, simultaneous speakers render sp...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7287755/ https://www.ncbi.nlm.nih.gov/pubmed/32456031 http://dx.doi.org/10.3390/s20102948 |
_version_ | 1783545122080686080 |
---|---|
author | Nozawa, Takayuki Uchiyama, Mizuki Honda, Keigo Nakano, Tamio Miyake, Yoshihiro |
author_facet | Nozawa, Takayuki Uchiyama, Mizuki Honda, Keigo Nakano, Tamio Miyake, Yoshihiro |
author_sort | Nozawa, Takayuki |
collection | PubMed |
description | Speech discrimination that determines whether a participant is speaking at a given moment is essential in investigating human verbal communication. Specifically, in dynamic real-world situations where multiple people participate in, and form, groups in the same space, simultaneous speakers render speech discrimination that is solely based on audio sensing difficult. In this study, we focused on physical activity during speech, and hypothesized that combining audio and physical motion data acquired by wearable sensors can improve speech discrimination. Thus, utterance and physical activity data of students in a university participatory class were recorded, using smartphones worn around their neck. First, we tested the temporal relationship between manually identified utterances and physical motions and confirmed that physical activities in wide-frequency ranges co-occurred with utterances. Second, we trained and tested classifiers for each participant and found a higher performance with the audio-motion classifier (average accuracy 92.2%) than both the audio-only (80.4%) and motion-only (87.8%) classifiers. Finally, we tested inter-individual classification and obtained a higher performance with the audio-motion combined classifier (83.2%) than the audio-only (67.7%) and motion-only (71.9%) classifiers. These results show that audio-motion multimodal sensing using widely available smartphones can provide effective utterance discrimination in dynamic group communications. |
format | Online Article Text |
id | pubmed-7287755 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-72877552020-06-15 Speech Discrimination in Real-World Group Communication Using Audio-Motion Multimodal Sensing Nozawa, Takayuki Uchiyama, Mizuki Honda, Keigo Nakano, Tamio Miyake, Yoshihiro Sensors (Basel) Article Speech discrimination that determines whether a participant is speaking at a given moment is essential in investigating human verbal communication. Specifically, in dynamic real-world situations where multiple people participate in, and form, groups in the same space, simultaneous speakers render speech discrimination that is solely based on audio sensing difficult. In this study, we focused on physical activity during speech, and hypothesized that combining audio and physical motion data acquired by wearable sensors can improve speech discrimination. Thus, utterance and physical activity data of students in a university participatory class were recorded, using smartphones worn around their neck. First, we tested the temporal relationship between manually identified utterances and physical motions and confirmed that physical activities in wide-frequency ranges co-occurred with utterances. Second, we trained and tested classifiers for each participant and found a higher performance with the audio-motion classifier (average accuracy 92.2%) than both the audio-only (80.4%) and motion-only (87.8%) classifiers. Finally, we tested inter-individual classification and obtained a higher performance with the audio-motion combined classifier (83.2%) than the audio-only (67.7%) and motion-only (71.9%) classifiers. These results show that audio-motion multimodal sensing using widely available smartphones can provide effective utterance discrimination in dynamic group communications. MDPI 2020-05-22 /pmc/articles/PMC7287755/ /pubmed/32456031 http://dx.doi.org/10.3390/s20102948 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Nozawa, Takayuki Uchiyama, Mizuki Honda, Keigo Nakano, Tamio Miyake, Yoshihiro Speech Discrimination in Real-World Group Communication Using Audio-Motion Multimodal Sensing |
title | Speech Discrimination in Real-World Group Communication Using Audio-Motion Multimodal Sensing |
title_full | Speech Discrimination in Real-World Group Communication Using Audio-Motion Multimodal Sensing |
title_fullStr | Speech Discrimination in Real-World Group Communication Using Audio-Motion Multimodal Sensing |
title_full_unstemmed | Speech Discrimination in Real-World Group Communication Using Audio-Motion Multimodal Sensing |
title_short | Speech Discrimination in Real-World Group Communication Using Audio-Motion Multimodal Sensing |
title_sort | speech discrimination in real-world group communication using audio-motion multimodal sensing |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7287755/ https://www.ncbi.nlm.nih.gov/pubmed/32456031 http://dx.doi.org/10.3390/s20102948 |
work_keys_str_mv | AT nozawatakayuki speechdiscriminationinrealworldgroupcommunicationusingaudiomotionmultimodalsensing AT uchiyamamizuki speechdiscriminationinrealworldgroupcommunicationusingaudiomotionmultimodalsensing AT hondakeigo speechdiscriminationinrealworldgroupcommunicationusingaudiomotionmultimodalsensing AT nakanotamio speechdiscriminationinrealworldgroupcommunicationusingaudiomotionmultimodalsensing AT miyakeyoshihiro speechdiscriminationinrealworldgroupcommunicationusingaudiomotionmultimodalsensing |