Cargando…

A preliminary study on improving the recognition of esophageal speech using a hybrid system based on statistical voice conversion

In this paper, we propose a hybrid system based on a modified statistical GMM voice conversion algorithm for improving the recognition of esophageal speech. This hybrid system aims to compensate for the distorted information present in the esophageal acoustic features by using a voice conversion met...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lachhab, Othman, Di Martino, Joseph, Elhaj, Elhassane Ibn, Hammouch, Ahmed
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer International Publishing 2015
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4627987/ https://www.ncbi.nlm.nih.gov/pubmed/26543778 http://dx.doi.org/10.1186/s40064-015-1428-2

Descripción
Sumario:	In this paper, we propose a hybrid system based on a modified statistical GMM voice conversion algorithm for improving the recognition of esophageal speech. This hybrid system aims to compensate for the distorted information present in the esophageal acoustic features by using a voice conversion method. The esophageal speech is converted into a “target” laryngeal speech using an iterative statistical estimation of a transformation function. We did not apply a speech synthesizer for reconstructing the converted speech signal, given that the converted Mel cepstral vectors are used directly as input of our speech recognition system. Furthermore the feature vectors are linearly transformed by the HLDA (heteroscedastic linear discriminant analysis) method to reduce their size in a smaller space having good discriminative properties. The experimental results demonstrate that our proposed system provides an improvement of the phone recognition accuracy with an absolute increase of 3.40 % when compared with the phone recognition accuracy obtained with neither HLDA nor voice conversion.

A preliminary study on improving the recognition of esophageal speech using a hybrid system based on statistical voice conversion

Ejemplares similares