Cargando…
Multimodal Sensor-Input Architecture with Deep Learning for Audio-Visual Speech Recognition in Wild
This paper investigates multimodal sensor architectures with deep learning for audio-visual speech recognition, focusing on in-the-wild scenarios. The term “in the wild” is used to describe AVSR for unconstrained natural-language audio streams and video-stream modalities. Audio-visual speech recogni...
Autores principales: | He, Yibo, Seng, Kah Phooi, Ang, Li Minn |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9959127/ https://www.ncbi.nlm.nih.gov/pubmed/36850432 http://dx.doi.org/10.3390/s23041834 |
Ejemplares similares
-
Binary Neural Networks in FPGAs: Architectures, Tool Flows and Hardware Comparisons
por: Su, Yuanxin, et al.
Publicado: (2023) -
Multimodal analytics for next-generation big data technologies and applications
por: Seng, Kah Phooi, et al.
Publicado: (2019) -
Natural Inspired Intelligent Visual Computing and Its Application to Viticulture
por: Ang, Li Minn, et al.
Publicado: (2017) -
Audio-Visual Speech and Gesture Recognition by Sensors of Mobile Devices
por: Ryumin, Dmitry, et al.
Publicado: (2023) -
Noise-Robust Multimodal Audio-Visual Speech Recognition System for Speech-Based Interaction Applications
por: Jeon, Sanghun, et al.
Publicado: (2022)