Cargando…

Hypothesis testing for evaluating a multimodal pattern recognition framework applied to speaker detection

BACKGROUND: Speaker detection is an important component of many human-computer interaction applications, like for example, multimedia indexing, or ambient intelligent systems. This work addresses the problem of detecting the current speaker in audio-visual sequences. The detector performs with few a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Besson, Patricia, Kunt, Murat
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Methodology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2390568/ https://www.ncbi.nlm.nih.gov/pubmed/18371191 http://dx.doi.org/10.1186/1743-0003-5-11

_version_	1782155325335404544
author	Besson, Patricia Kunt, Murat
author_facet	Besson, Patricia Kunt, Murat
author_sort	Besson, Patricia
collection	PubMed
description	BACKGROUND: Speaker detection is an important component of many human-computer interaction applications, like for example, multimedia indexing, or ambient intelligent systems. This work addresses the problem of detecting the current speaker in audio-visual sequences. The detector performs with few and simple material since a single camera and microphone meets the needs. METHOD: A multimodal pattern recognition framework is proposed, with solutions provided for each step of the process, namely, the feature generation and extraction steps, the classification, and the evaluation of the system performance. The decision is based on the estimation of the synchrony between the audio and the video signals. Prior to the classification, an information theoretic framework is applied to extract optimized audio features using video information. The classification step is then defined through a hypothesis testing framework in order to get confidence levels associated to the classifier outputs, allowing thereby an evaluation of the performance of the whole multimodal pattern recognition system. RESULTS: Through the hypothesis testing approach, the classifier performance can be given as a ratio of detection to false-alarm probabilities. Above all, the hypothesis tests give means for measuring the whole pattern recognition process effciency. In particular, the gain offered by the proposed feature extraction step can be evaluated. As a result, it is shown that introducing such a feature extraction step increases the ability of the classifier to produce good relative instance scores, and therefore, the performance of the pattern recognition process. CONCLUSION: The powerful capacities of hypothesis tests as an evaluation tool are exploited to assess the performance of a multimodal pattern recognition process. In particular, the advantage of performing or not a feature extraction step prior to the classification is evaluated. Although the proposed framework is used here for detecting the speaker in audiovisual sequences, it could be applied to any other classification task involving two spatio-temporal co-occurring signals.
format	Text
id	pubmed-2390568
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-23905682008-05-21 Hypothesis testing for evaluating a multimodal pattern recognition framework applied to speaker detection Besson, Patricia Kunt, Murat J Neuroeng Rehabil Methodology BACKGROUND: Speaker detection is an important component of many human-computer interaction applications, like for example, multimedia indexing, or ambient intelligent systems. This work addresses the problem of detecting the current speaker in audio-visual sequences. The detector performs with few and simple material since a single camera and microphone meets the needs. METHOD: A multimodal pattern recognition framework is proposed, with solutions provided for each step of the process, namely, the feature generation and extraction steps, the classification, and the evaluation of the system performance. The decision is based on the estimation of the synchrony between the audio and the video signals. Prior to the classification, an information theoretic framework is applied to extract optimized audio features using video information. The classification step is then defined through a hypothesis testing framework in order to get confidence levels associated to the classifier outputs, allowing thereby an evaluation of the performance of the whole multimodal pattern recognition system. RESULTS: Through the hypothesis testing approach, the classifier performance can be given as a ratio of detection to false-alarm probabilities. Above all, the hypothesis tests give means for measuring the whole pattern recognition process effciency. In particular, the gain offered by the proposed feature extraction step can be evaluated. As a result, it is shown that introducing such a feature extraction step increases the ability of the classifier to produce good relative instance scores, and therefore, the performance of the pattern recognition process. CONCLUSION: The powerful capacities of hypothesis tests as an evaluation tool are exploited to assess the performance of a multimodal pattern recognition process. In particular, the advantage of performing or not a feature extraction step prior to the classification is evaluated. Although the proposed framework is used here for detecting the speaker in audiovisual sequences, it could be applied to any other classification task involving two spatio-temporal co-occurring signals. BioMed Central 2008-03-27 /pmc/articles/PMC2390568/ /pubmed/18371191 http://dx.doi.org/10.1186/1743-0003-5-11 Text en Copyright © 2008 Besson and Kunt; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Besson, Patricia Kunt, Murat Hypothesis testing for evaluating a multimodal pattern recognition framework applied to speaker detection
title	Hypothesis testing for evaluating a multimodal pattern recognition framework applied to speaker detection
title_full	Hypothesis testing for evaluating a multimodal pattern recognition framework applied to speaker detection
title_fullStr	Hypothesis testing for evaluating a multimodal pattern recognition framework applied to speaker detection
title_full_unstemmed	Hypothesis testing for evaluating a multimodal pattern recognition framework applied to speaker detection
title_short	Hypothesis testing for evaluating a multimodal pattern recognition framework applied to speaker detection
title_sort	hypothesis testing for evaluating a multimodal pattern recognition framework applied to speaker detection
topic	Methodology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2390568/ https://www.ncbi.nlm.nih.gov/pubmed/18371191 http://dx.doi.org/10.1186/1743-0003-5-11
work_keys_str_mv	AT bessonpatricia hypothesistestingforevaluatingamultimodalpatternrecognitionframeworkappliedtospeakerdetection AT kuntmurat hypothesistestingforevaluatingamultimodalpatternrecognitionframeworkappliedtospeakerdetection

Hypothesis testing for evaluating a multimodal pattern recognition framework applied to speaker detection

Ejemplares similares