Cargando…

A Facial Feature and Lip Movement Enhanced Audio-Visual Speech Separation Model

The cocktail party problem can be more effectively addressed by leveraging the speaker’s visual and audio information. This paper proposes a method to improve the audio’s separation using two visual cues: facial features and lip movement. Firstly, residual connections are introduced in the audio sep...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Guizhu, Fu, Min, Sun, Mengnan, Liu, Xuefeng, Zheng, Bing
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10647675/ https://www.ncbi.nlm.nih.gov/pubmed/37960477 http://dx.doi.org/10.3390/s23218770

Ejemplares similares

Audio source separation and speech enhancement
por: Vincent, Emmanuel, et al.
Publicado: (2018)

Audio-Visual Speech Cue Combination
por: Arnold, Derek H., et al.
Publicado: (2010)

Differential Auditory and Visual Phase-Locking Are Observed during Audio-Visual Benefit and Silent Lip-Reading for Speech Perception
por: Aller, Máté, et al.
Publicado: (2022)

Audio-Visual Speech Timing Sensitivity Is Enhanced in Cluttered Conditions
por: Roseboom, Warrick, et al.
Publicado: (2011)

Talker variability in audio-visual speech perception
por: Heald, Shannon L. M., et al.
Publicado: (2014)

Speech and Non-Speech Audio-Visual Illusions: A Developmental Study
por: Tremblay, Corinne, et al.
Publicado: (2007)

Do gender differences in audio-visual benefit and visual influence in audio-visual speech perception emerge with age?
por: Alm, Magnus, et al.
Publicado: (2015)

Adaptation to Social-Linguistic Associations in Audio-Visual Speech
por: Babel, Molly
Publicado: (2022)

Integrative interaction of emotional speech in audio-visual modality
por: Dong, Haibin, et al.
Publicado: (2022)

Contributions of local speech encoding and functional connectivity to audio-visual speech perception
por: Giordano, Bruno L, et al.
Publicado: (2017)

Speech and audio processing for coding, enhancement and recognition
por: Ogunfunmi, Tokunbo, et al.
Publicado: (2015)

Audio-visual speech perception: a developmental ERP investigation
por: Knowland, Victoria CP, et al.
Publicado: (2014)

Reliability-Based Large-Vocabulary Audio-Visual Speech Recognition
por: Yu, Wentao, et al.
Publicado: (2022)

Audio-Visual Speech and Gesture Recognition by Sensors of Mobile Devices
por: Ryumin, Dmitry, et al.
Publicado: (2023)

Neural Entrainment to Rhythmically Presented Auditory, Visual, and Audio-Visual Speech in Children
por: Power, Alan James, et al.
Publicado: (2012)

Impact of Audio-Visual Asynchrony on Lip-Reading Effects -Neuromagnetic and Psychophysical Study-
por: Kawase, Tetsuaki, et al.
Publicado: (2016)

Off-Screen Sound Separation Based on Audio-visual Pre-training Using Binaural Audio
por: Yoshida, Masaki, et al.
Publicado: (2023)

Noise-Robust Multimodal Audio-Visual Speech Recognition System for Speech-Based Interaction Applications
por: Jeon, Sanghun, et al.
Publicado: (2022)

Audio Feedback Associated With Body Movement Enhances Audio and Somatosensory Spatial Representation
por: Cuppone, Anna Vera, et al.
Publicado: (2018)

Erratum: Neural entrainment to rhythmically-presented auditory, visual and audio-visual speech in children
por: Power, Alan J., et al.
Publicado: (2013)

No, There Is No 150 ms Lead of Visual Speech on Auditory Speech, but a Range of Audiovisual Asynchronies Varying from Small Audio Lead to Large Audio Lag
por: Schwartz, Jean-Luc, et al.
Publicado: (2014)

Correlated lip motion and voice audio data
por: Colasito, Marco, et al.
Publicado: (2018)

Audio source separation
por: Makino, Shoji
Publicado: (2018)

Facial grimace during speech in cleft lip and palate: a proposal for classification
por: Scarmagnani, Rafaeli Higa, et al.
Publicado: (2022)

Cue Integration in Categorical Tasks: Insights from Audio-Visual Speech Perception
por: Bejjanki, Vikranth Rao, et al.
Publicado: (2011)

Cross-Modal Matching of Audio-Visual German and French Fluent Speech in Infancy
por: Kubicek, Claudia, et al.
Publicado: (2014)

Audio-Visual Perception of Gender by Infants Emerges Earlier for Adult-Directed Speech
por: Richoz, Anne-Raphaëlle, et al.
Publicado: (2017)

Animated virtual characters to explore audio-visual speech in controlled and naturalistic environments
por: Thézé, Raphaël, et al.
Publicado: (2020)

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English
por: Livingstone, Steven R., et al.
Publicado: (2018)

Speech and audio signal processing: processing and perception of speech and music
por: Gold, Bernard, et al.
Publicado: (2011)

Lip‐closing strength in children is enhanced by lip and facial muscle training
por: Nogami, Yukiko, et al.
Publicado: (2021)

The Effect of Combined Sensory and Semantic Components on Audio–Visual Speech Perception in Older Adults
por: Maguinness, Corrina, et al.
Publicado: (2011)

Involvement of Right STS in Audio-Visual Integration for Affective Speech Demonstrated Using MEG
por: Hagan, Cindy C., et al.
Publicado: (2013)

Top-Down Predictions of Familiarity and Congruency in Audio-Visual Speech Perception at Neural Level
por: Kolozsvári, Orsolya B., et al.
Publicado: (2019)

Semantic Cues Modulate Children’s and Adults’ Processing of Audio-Visual Face Mask Speech
por: Schwarz, Julia, et al.
Publicado: (2022)

Multimodal Sensor-Input Architecture with Deep Learning for Audio-Visual Speech Recognition in Wild
por: He, Yibo, et al.
Publicado: (2023)

A CNN-Assisted Enhanced Audio Signal Processing for Speech Emotion Recognition
por: Mustaqeem,, et al.
Publicado: (2019)

Lip movements entrain the observers’ low-frequency brain oscillations to facilitate speech intelligibility
por: Park, Hyojin, et al.
Publicado: (2016)

Atypical audio-visual speech perception and McGurk effects in children with specific language impairment
por: Leybaert, Jacqueline, et al.
Publicado: (2014)

Neural oscillations in the temporal pole for a temporally congruent audio-visual speech detection task
por: Ohki, Takefumi, et al.
Publicado: (2016)

Cannot write session to /tmp/vufind_sessions/sess_e2rj8m9vdb375h4bsmhd0g0hbt