Cargando…

Visual Speech Recognition with Lightweight Psychologically Motivated Gabor Features

Extraction of relevant lip features is of continuing interest in the visual speech domain. Using end-to-end feature extraction can produce good results, but at the cost of the results being difficult for humans to comprehend and relate to. We present a new, lightweight feature extraction approach, m...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Xuejie, Xu, Yan, Abel, Andrew K., Smith, Leslie S., Watt, Roger, Hussain, Amir, Gao, Chengxiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7761847/
https://www.ncbi.nlm.nih.gov/pubmed/33279914
http://dx.doi.org/10.3390/e22121367
_version_ 1783627664190341120
author Zhang, Xuejie
Xu, Yan
Abel, Andrew K.
Smith, Leslie S.
Watt, Roger
Hussain, Amir
Gao, Chengxiang
author_facet Zhang, Xuejie
Xu, Yan
Abel, Andrew K.
Smith, Leslie S.
Watt, Roger
Hussain, Amir
Gao, Chengxiang
author_sort Zhang, Xuejie
collection PubMed
description Extraction of relevant lip features is of continuing interest in the visual speech domain. Using end-to-end feature extraction can produce good results, but at the cost of the results being difficult for humans to comprehend and relate to. We present a new, lightweight feature extraction approach, motivated by human-centric glimpse-based psychological research into facial barcodes, and demonstrate that these simple, easy to extract 3D geometric features (produced using Gabor-based image patches), can successfully be used for speech recognition with LSTM-based machine learning. This approach can successfully extract low dimensionality lip parameters with a minimum of processing. One key difference between using these Gabor-based features and using other features such as traditional DCT, or the current fashion for CNN features is that these are human-centric features that can be visualised and analysed by humans. This means that it is easier to explain and visualise the results. They can also be used for reliable speech recognition, as demonstrated using the Grid corpus. Results for overlapping speakers using our lightweight system gave a recognition rate of over 82%, which compares well to less explainable features in the literature.
format Online
Article
Text
id pubmed-7761847
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-77618472021-02-24 Visual Speech Recognition with Lightweight Psychologically Motivated Gabor Features Zhang, Xuejie Xu, Yan Abel, Andrew K. Smith, Leslie S. Watt, Roger Hussain, Amir Gao, Chengxiang Entropy (Basel) Article Extraction of relevant lip features is of continuing interest in the visual speech domain. Using end-to-end feature extraction can produce good results, but at the cost of the results being difficult for humans to comprehend and relate to. We present a new, lightweight feature extraction approach, motivated by human-centric glimpse-based psychological research into facial barcodes, and demonstrate that these simple, easy to extract 3D geometric features (produced using Gabor-based image patches), can successfully be used for speech recognition with LSTM-based machine learning. This approach can successfully extract low dimensionality lip parameters with a minimum of processing. One key difference between using these Gabor-based features and using other features such as traditional DCT, or the current fashion for CNN features is that these are human-centric features that can be visualised and analysed by humans. This means that it is easier to explain and visualise the results. They can also be used for reliable speech recognition, as demonstrated using the Grid corpus. Results for overlapping speakers using our lightweight system gave a recognition rate of over 82%, which compares well to less explainable features in the literature. MDPI 2020-12-03 /pmc/articles/PMC7761847/ /pubmed/33279914 http://dx.doi.org/10.3390/e22121367 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Zhang, Xuejie
Xu, Yan
Abel, Andrew K.
Smith, Leslie S.
Watt, Roger
Hussain, Amir
Gao, Chengxiang
Visual Speech Recognition with Lightweight Psychologically Motivated Gabor Features
title Visual Speech Recognition with Lightweight Psychologically Motivated Gabor Features
title_full Visual Speech Recognition with Lightweight Psychologically Motivated Gabor Features
title_fullStr Visual Speech Recognition with Lightweight Psychologically Motivated Gabor Features
title_full_unstemmed Visual Speech Recognition with Lightweight Psychologically Motivated Gabor Features
title_short Visual Speech Recognition with Lightweight Psychologically Motivated Gabor Features
title_sort visual speech recognition with lightweight psychologically motivated gabor features
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7761847/
https://www.ncbi.nlm.nih.gov/pubmed/33279914
http://dx.doi.org/10.3390/e22121367
work_keys_str_mv AT zhangxuejie visualspeechrecognitionwithlightweightpsychologicallymotivatedgaborfeatures
AT xuyan visualspeechrecognitionwithlightweightpsychologicallymotivatedgaborfeatures
AT abelandrewk visualspeechrecognitionwithlightweightpsychologicallymotivatedgaborfeatures
AT smithleslies visualspeechrecognitionwithlightweightpsychologicallymotivatedgaborfeatures
AT wattroger visualspeechrecognitionwithlightweightpsychologicallymotivatedgaborfeatures
AT hussainamir visualspeechrecognitionwithlightweightpsychologicallymotivatedgaborfeatures
AT gaochengxiang visualspeechrecognitionwithlightweightpsychologicallymotivatedgaborfeatures