Cargando…

Multimodal Sensor-Input Architecture with Deep Learning for Audio-Visual Speech Recognition in Wild

This paper investigates multimodal sensor architectures with deep learning for audio-visual speech recognition, focusing on in-the-wild scenarios. The term “in the wild” is used to describe AVSR for unconstrained natural-language audio streams and video-stream modalities. Audio-visual speech recogni...

Descripción completa

Detalles Bibliográficos
Autores principales: He, Yibo, Seng, Kah Phooi, Ang, Li Minn
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9959127/
https://www.ncbi.nlm.nih.gov/pubmed/36850432
http://dx.doi.org/10.3390/s23041834