Cargando…

Natural-Language-Driven Multimodal Representation Learning for Audio-Visual Scene-Aware Dialog System

With the development of multimedia systems in wireless environments, the rising need for artificial intelligence is to design a system that can properly communicate with humans with a comprehensive understanding of various types of information in a human-like manner. Therefore, this paper addresses...

Descripción completa

Detalles Bibliográficos
Autores principales:	Heo, Yoonseok, Kang, Sangwoo, Seo, Jungyun
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10536977/ https://www.ncbi.nlm.nih.gov/pubmed/37765933 http://dx.doi.org/10.3390/s23187875

Ejemplares similares

Long-term memory representations for audio-visual scenes
por: Meyerhoff, Hauke S., et al.
Publicado: (2022)

An Efficient Framework for Development of Task-Oriented Dialog Systems in a Smart Home Environment
por: Park, Youngmin, et al.
Publicado: (2018)

A3CarScene: An audio-visual dataset for driving scene understanding
por: Cantarini, Michela, et al.
Publicado: (2023)

Spoken natural language dialog systems : a practical approach
por: Smith Ronnie W
Publicado: (1944)

Spoken natural language dialog systems: a practical approach
por: Smith, Ronnie W, et al.
Publicado: (1994)

Cortical Plasticity of Audio–Visual Object Representations
por: Naumer, Marcus J., et al.
Publicado: (2009)

DIALOG: a language for instrumentation diagnosis
por: Burns, A, et al.
Publicado: (1987)

Dynamic, Task-Related and Demand-Driven Scene Representation
por: Rebhan, Sven, et al.
Publicado: (2010)

Generation of stable heading representations in diverse visual scenes
por: Kim, Sung Soo, et al.
Publicado: (2019)

Combined representation of visual features in the scene-selective cortex
por: Kang, Jisu, et al.
Publicado: (2023)

Information-Driven Active Audio-Visual Source Localization
por: Schult, Niclas, et al.
Publicado: (2015)

Spatial frequency supports the emergence of categorical representations in visual cortex during natural scene perception
por: Dima, Diana C., et al.
Publicado: (2018)

Multimodal Hallucination (Audio-visual, Kinaesthetic and Scenic) Associated with the Use of Zolpidem
por: Ram, Dushad, et al.
Publicado: (2015)

Systematic literature review on audio-visual multimodal input in listening comprehension
por: Shaojie, Tan, et al.
Publicado: (2022)

StreetAware: A High-Resolution Synchronized Multimodal Urban Scene Dataset
por: Piadyk, Yurii, et al.
Publicado: (2023)

Multimodal Scene Understanding
por: Yang, Michael
Publicado: (2019)

Speakers of different languages remember visual scenes differently
por: Fernandez-Duque, Matias, et al.
Publicado: (2023)

The Development of Hand-Centered Visual Representations in the Primate Brain: A Computer Modeling Study Using Natural Visual Scenes
por: Galeazzi, Juan M., et al.
Publicado: (2015)

Authoring Selves in Language Teaching: A Dialogic Approach to Language Teacher Psychology
por: Chen, Shan, et al.
Publicado: (2022)

Building shared situational awareness in surgery through distributed dialog
por: Gillespie, Brigid M, et al.
Publicado: (2013)

Is the preference of natural versus man-made scenes driven by bottom–up processing of the visual features of nature?
por: Kardan, Omid, et al.
Publicado: (2015)

Decoding individual natural scene representations during perception and imagery
por: Johnson, Matthew R., et al.
Publicado: (2014)

Correction: Information-Driven Active Audio-Visual Source Localization
Publicado: (2017)

Transmission of natural scene images through a multimode fibre
por: Caramazza, Piergiorgio, et al.
Publicado: (2019)

The Sound of Vision Project: On the Feasibility of an Audio-Haptic Representation of the Environment, for the Visually Impaired
por: Jóhannesson, Ómar I., et al.
Publicado: (2016)

Understanding Design Features of Music and Language: The Choric/Dialogic Distinction
por: Haiduk, Felix, et al.
Publicado: (2022)

Audio Spatial Representation Around the Body
por: Aggius-Vella, Elena, et al.
Publicado: (2017)

Dynamic Scene Stitching Driven by Visual Cognition Model
por: Zou, Li-hui, et al.
Publicado: (2014)

Objects sharpen visual scene representations: evidence from MEG decoding
por: Brandman, Talia, et al.
Publicado: (2023)

Audio Feedback Associated With Body Movement Enhances Audio and Somatosensory Spatial Representation
por: Cuppone, Anna Vera, et al.
Publicado: (2018)

Summation of perceptual cues in natural visual scenes
por: To, M, et al.
Publicado: (2008)

Cortical Sensitivity to Visual Features in Natural Scenes
por: Felsen, Gidon, et al.
Publicado: (2005)

The extraction of natural scene gist in visual crowding
por: Gong, Mingliang, et al.
Publicado: (2018)

Visual perspective-taking in complex natural scenes
por: Del Sette, Paola, et al.
Publicado: (2021)

Efficient processing of natural scenes in visual cortex
por: Tesileanu, Tiberiu, et al.
Publicado: (2022)

Multimodal Speaker Diarization Using a Pre-Trained Audio-Visual Synchronization Model
por: Ahmad, Rehan, et al.
Publicado: (2019)

Multimodal Sensor-Input Architecture with Deep Learning for Audio-Visual Speech Recognition in Wild
por: He, Yibo, et al.
Publicado: (2023)

Scene-Aware Adaptive Updating for Visual Tracking via Correlation Filters
por: Li, Fan, et al.
Publicado: (2017)

Representation of visual scenes by local neuronal populations in layer 2/3 of mouse visual cortex
por: Kampa, Björn M., et al.
Publicado: (2011)

The representational hierarchy in human and artificial visual systems in the presence of object-scene regularities
por: Bracci, Stefania, et al.
Publicado: (2023)

Cannot write session to /tmp/vufind_sessions/sess_psj0qetd1cetf2kqdrivt79u0p