Cargando…

Using Complexity-Identical Human- and Machine-Directed Utterances to Investigate Addressee Detection for Spoken Dialogue Systems

Human-machine addressee detection (H-M AD) is a modern paralinguistics and dialogue challenge that arises in multiparty conversations between several people and a spoken dialogue system (SDS) since the users may also talk to each other and even to themselves while interacting with the system. The SD...

Descripción completa

Detalles Bibliográficos
Autores principales: Akhtiamov, Oleg, Siegert, Ingo, Karpov, Alexey, Minker, Wolfgang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7249173/
https://www.ncbi.nlm.nih.gov/pubmed/32403365
http://dx.doi.org/10.3390/s20092740
_version_ 1783538543079981056
author Akhtiamov, Oleg
Siegert, Ingo
Karpov, Alexey
Minker, Wolfgang
author_facet Akhtiamov, Oleg
Siegert, Ingo
Karpov, Alexey
Minker, Wolfgang
author_sort Akhtiamov, Oleg
collection PubMed
description Human-machine addressee detection (H-M AD) is a modern paralinguistics and dialogue challenge that arises in multiparty conversations between several people and a spoken dialogue system (SDS) since the users may also talk to each other and even to themselves while interacting with the system. The SDS is supposed to determine whether it is being addressed or not. All existing studies on acoustic H-M AD were conducted on corpora designed in such a way that a human addressee and a machine played different dialogue roles. This peculiarity influences speakers’ behaviour and increases vocal differences between human- and machine-directed utterances. In the present study, we consider the Restaurant Booking Corpus (RBC) that consists of complexity-identical human- and machine-directed phone calls and allows us to eliminate most of the factors influencing speakers’ behaviour implicitly. The only remaining factor is the speakers’ explicit awareness of their interlocutor (technical system or human being). Although complexity-identical H-M AD is essentially more challenging than the classical one, we managed to achieve significant improvements using data augmentation (unweighted average recall (UAR) = 0.628) over native listeners (UAR = 0.596) and a baseline classifier presented by the RBC developers (UAR = 0.539).
format Online
Article
Text
id pubmed-7249173
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-72491732020-06-10 Using Complexity-Identical Human- and Machine-Directed Utterances to Investigate Addressee Detection for Spoken Dialogue Systems Akhtiamov, Oleg Siegert, Ingo Karpov, Alexey Minker, Wolfgang Sensors (Basel) Article Human-machine addressee detection (H-M AD) is a modern paralinguistics and dialogue challenge that arises in multiparty conversations between several people and a spoken dialogue system (SDS) since the users may also talk to each other and even to themselves while interacting with the system. The SDS is supposed to determine whether it is being addressed or not. All existing studies on acoustic H-M AD were conducted on corpora designed in such a way that a human addressee and a machine played different dialogue roles. This peculiarity influences speakers’ behaviour and increases vocal differences between human- and machine-directed utterances. In the present study, we consider the Restaurant Booking Corpus (RBC) that consists of complexity-identical human- and machine-directed phone calls and allows us to eliminate most of the factors influencing speakers’ behaviour implicitly. The only remaining factor is the speakers’ explicit awareness of their interlocutor (technical system or human being). Although complexity-identical H-M AD is essentially more challenging than the classical one, we managed to achieve significant improvements using data augmentation (unweighted average recall (UAR) = 0.628) over native listeners (UAR = 0.596) and a baseline classifier presented by the RBC developers (UAR = 0.539). MDPI 2020-05-11 /pmc/articles/PMC7249173/ /pubmed/32403365 http://dx.doi.org/10.3390/s20092740 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Akhtiamov, Oleg
Siegert, Ingo
Karpov, Alexey
Minker, Wolfgang
Using Complexity-Identical Human- and Machine-Directed Utterances to Investigate Addressee Detection for Spoken Dialogue Systems
title Using Complexity-Identical Human- and Machine-Directed Utterances to Investigate Addressee Detection for Spoken Dialogue Systems
title_full Using Complexity-Identical Human- and Machine-Directed Utterances to Investigate Addressee Detection for Spoken Dialogue Systems
title_fullStr Using Complexity-Identical Human- and Machine-Directed Utterances to Investigate Addressee Detection for Spoken Dialogue Systems
title_full_unstemmed Using Complexity-Identical Human- and Machine-Directed Utterances to Investigate Addressee Detection for Spoken Dialogue Systems
title_short Using Complexity-Identical Human- and Machine-Directed Utterances to Investigate Addressee Detection for Spoken Dialogue Systems
title_sort using complexity-identical human- and machine-directed utterances to investigate addressee detection for spoken dialogue systems
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7249173/
https://www.ncbi.nlm.nih.gov/pubmed/32403365
http://dx.doi.org/10.3390/s20092740
work_keys_str_mv AT akhtiamovoleg usingcomplexityidenticalhumanandmachinedirectedutterancestoinvestigateaddresseedetectionforspokendialoguesystems
AT siegertingo usingcomplexityidenticalhumanandmachinedirectedutterancestoinvestigateaddresseedetectionforspokendialoguesystems
AT karpovalexey usingcomplexityidenticalhumanandmachinedirectedutterancestoinvestigateaddresseedetectionforspokendialoguesystems
AT minkerwolfgang usingcomplexityidenticalhumanandmachinedirectedutterancestoinvestigateaddresseedetectionforspokendialoguesystems