Cargando…

I Hear You Eat and Speak: Automatic Recognition of Eating Condition and Food Type, Use-Cases, and Impact on ASR Performance

We propose a new recognition task in the area of computational paralinguistics: automatic recognition of eating conditions in speech, i. e., whether people are eating while speaking, and what they are eating. To this end, we introduce the audio-visual iHEARu-EAT database featuring 1.6 k utterances o...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hantke, Simone, Weninger, Felix, Kurle, Richard, Ringeval, Fabien, Batliner, Anton, Mousa, Amr El-Desoky, Schuller, Björn
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2016
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4866718/ https://www.ncbi.nlm.nih.gov/pubmed/27176486 http://dx.doi.org/10.1371/journal.pone.0154486

_version_	1782431956230733824
author	Hantke, Simone Weninger, Felix Kurle, Richard Ringeval, Fabien Batliner, Anton Mousa, Amr El-Desoky Schuller, Björn
author_facet	Hantke, Simone Weninger, Felix Kurle, Richard Ringeval, Fabien Batliner, Anton Mousa, Amr El-Desoky Schuller, Björn
author_sort	Hantke, Simone
collection	PubMed
description	We propose a new recognition task in the area of computational paralinguistics: automatic recognition of eating conditions in speech, i. e., whether people are eating while speaking, and what they are eating. To this end, we introduce the audio-visual iHEARu-EAT database featuring 1.6 k utterances of 30 subjects (mean age: 26.1 years, standard deviation: 2.66 years, gender balanced, German speakers), six types of food (Apple, Nectarine, Banana, Haribo Smurfs, Biscuit, and Crisps), and read as well as spontaneous speech, which is made publicly available for research purposes. We start with demonstrating that for automatic speech recognition (ASR), it pays off to know whether speakers are eating or not. We also propose automatic classification both by brute-forcing of low-level acoustic features as well as higher-level features related to intelligibility, obtained from an Automatic Speech Recogniser. Prediction of the eating condition was performed with a Support Vector Machine (SVM) classifier employed in a leave-one-speaker-out evaluation framework. Results show that the binary prediction of eating condition (i. e., eating or not eating) can be easily solved independently of the speaking condition; the obtained average recalls are all above 90%. Low-level acoustic features provide the best performance on spontaneous speech, which reaches up to 62.3% average recall for multi-way classification of the eating condition, i. e., discriminating the six types of food, as well as not eating. The early fusion of features related to intelligibility with the brute-forced acoustic feature set improves the performance on read speech, reaching a 66.4% average recall for the multi-way classification task. Analysing features and classifier errors leads to a suitable ordinal scale for eating conditions, on which automatic regression can be performed with up to 56.2% determination coefficient.
format	Online Article Text
id	pubmed-4866718
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-48667182016-05-18 I Hear You Eat and Speak: Automatic Recognition of Eating Condition and Food Type, Use-Cases, and Impact on ASR Performance Hantke, Simone Weninger, Felix Kurle, Richard Ringeval, Fabien Batliner, Anton Mousa, Amr El-Desoky Schuller, Björn PLoS One Research Article We propose a new recognition task in the area of computational paralinguistics: automatic recognition of eating conditions in speech, i. e., whether people are eating while speaking, and what they are eating. To this end, we introduce the audio-visual iHEARu-EAT database featuring 1.6 k utterances of 30 subjects (mean age: 26.1 years, standard deviation: 2.66 years, gender balanced, German speakers), six types of food (Apple, Nectarine, Banana, Haribo Smurfs, Biscuit, and Crisps), and read as well as spontaneous speech, which is made publicly available for research purposes. We start with demonstrating that for automatic speech recognition (ASR), it pays off to know whether speakers are eating or not. We also propose automatic classification both by brute-forcing of low-level acoustic features as well as higher-level features related to intelligibility, obtained from an Automatic Speech Recogniser. Prediction of the eating condition was performed with a Support Vector Machine (SVM) classifier employed in a leave-one-speaker-out evaluation framework. Results show that the binary prediction of eating condition (i. e., eating or not eating) can be easily solved independently of the speaking condition; the obtained average recalls are all above 90%. Low-level acoustic features provide the best performance on spontaneous speech, which reaches up to 62.3% average recall for multi-way classification of the eating condition, i. e., discriminating the six types of food, as well as not eating. The early fusion of features related to intelligibility with the brute-forced acoustic feature set improves the performance on read speech, reaching a 66.4% average recall for the multi-way classification task. Analysing features and classifier errors leads to a suitable ordinal scale for eating conditions, on which automatic regression can be performed with up to 56.2% determination coefficient. Public Library of Science 2016-05-13 /pmc/articles/PMC4866718/ /pubmed/27176486 http://dx.doi.org/10.1371/journal.pone.0154486 Text en © 2016 Hantke et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Hantke, Simone Weninger, Felix Kurle, Richard Ringeval, Fabien Batliner, Anton Mousa, Amr El-Desoky Schuller, Björn I Hear You Eat and Speak: Automatic Recognition of Eating Condition and Food Type, Use-Cases, and Impact on ASR Performance
title	I Hear You Eat and Speak: Automatic Recognition of Eating Condition and Food Type, Use-Cases, and Impact on ASR Performance
title_full	I Hear You Eat and Speak: Automatic Recognition of Eating Condition and Food Type, Use-Cases, and Impact on ASR Performance
title_fullStr	I Hear You Eat and Speak: Automatic Recognition of Eating Condition and Food Type, Use-Cases, and Impact on ASR Performance
title_full_unstemmed	I Hear You Eat and Speak: Automatic Recognition of Eating Condition and Food Type, Use-Cases, and Impact on ASR Performance
title_short	I Hear You Eat and Speak: Automatic Recognition of Eating Condition and Food Type, Use-Cases, and Impact on ASR Performance
title_sort	i hear you eat and speak: automatic recognition of eating condition and food type, use-cases, and impact on asr performance
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4866718/ https://www.ncbi.nlm.nih.gov/pubmed/27176486 http://dx.doi.org/10.1371/journal.pone.0154486
work_keys_str_mv	AT hantkesimone ihearyoueatandspeakautomaticrecognitionofeatingconditionandfoodtypeusecasesandimpactonasrperformance AT weningerfelix ihearyoueatandspeakautomaticrecognitionofeatingconditionandfoodtypeusecasesandimpactonasrperformance AT kurlerichard ihearyoueatandspeakautomaticrecognitionofeatingconditionandfoodtypeusecasesandimpactonasrperformance AT ringevalfabien ihearyoueatandspeakautomaticrecognitionofeatingconditionandfoodtypeusecasesandimpactonasrperformance AT batlineranton ihearyoueatandspeakautomaticrecognitionofeatingconditionandfoodtypeusecasesandimpactonasrperformance AT mousaamreldesoky ihearyoueatandspeakautomaticrecognitionofeatingconditionandfoodtypeusecasesandimpactonasrperformance AT schullerbjorn ihearyoueatandspeakautomaticrecognitionofeatingconditionandfoodtypeusecasesandimpactonasrperformance

I Hear You Eat and Speak: Automatic Recognition of Eating Condition and Food Type, Use-Cases, and Impact on ASR Performance

Ejemplares similares