Cargando…

Multi-Modal Representation via Contrastive Learning with Attention Bottleneck Fusion and Attentive Statistics Features

The integration of information from multiple modalities is a highly active area of research. Previous techniques have predominantly focused on fusing shallow features or high-level representations generated by deep unimodal networks, which only capture a subset of the hierarchical relationships acro...

Descripción completa

Detalles Bibliográficos
Autores principales:	Guo, Qinglang, Liao, Yong, Li, Zhe, Liang, Shenglin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10606612/ https://www.ncbi.nlm.nih.gov/pubmed/37895542 http://dx.doi.org/10.3390/e25101421

_version_	1785127357810475008
author	Guo, Qinglang Liao, Yong Li, Zhe Liang, Shenglin
author_facet	Guo, Qinglang Liao, Yong Li, Zhe Liang, Shenglin
author_sort	Guo, Qinglang
collection	PubMed
description	The integration of information from multiple modalities is a highly active area of research. Previous techniques have predominantly focused on fusing shallow features or high-level representations generated by deep unimodal networks, which only capture a subset of the hierarchical relationships across modalities. However, previous methods are often limited to exploiting the fine-grained statistical features inherent in multimodal data. This paper proposes an approach that densely integrates representations by computing image features’ means and standard deviations. The global statistics of features afford a holistic perspective, capturing the overarching distribution and trends inherent in the data, thereby facilitating enhanced comprehension and characterization of multimodal data. We also leverage a Transformer-based fusion encoder to effectively capture global variations in multimodal features. To further enhance the learning process, we incorporate a contrastive loss function that encourages the discovery of shared information across different modalities. To validate the effectiveness of our approach, we conduct experiments on three widely used multimodal sentiment analysis datasets. The results demonstrate the efficacy of our proposed method, achieving significant performance improvements compared to existing approaches.
format	Online Article Text
id	pubmed-10606612
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-106066122023-10-28 Multi-Modal Representation via Contrastive Learning with Attention Bottleneck Fusion and Attentive Statistics Features Guo, Qinglang Liao, Yong Li, Zhe Liang, Shenglin Entropy (Basel) Article The integration of information from multiple modalities is a highly active area of research. Previous techniques have predominantly focused on fusing shallow features or high-level representations generated by deep unimodal networks, which only capture a subset of the hierarchical relationships across modalities. However, previous methods are often limited to exploiting the fine-grained statistical features inherent in multimodal data. This paper proposes an approach that densely integrates representations by computing image features’ means and standard deviations. The global statistics of features afford a holistic perspective, capturing the overarching distribution and trends inherent in the data, thereby facilitating enhanced comprehension and characterization of multimodal data. We also leverage a Transformer-based fusion encoder to effectively capture global variations in multimodal features. To further enhance the learning process, we incorporate a contrastive loss function that encourages the discovery of shared information across different modalities. To validate the effectiveness of our approach, we conduct experiments on three widely used multimodal sentiment analysis datasets. The results demonstrate the efficacy of our proposed method, achieving significant performance improvements compared to existing approaches. MDPI 2023-10-07 /pmc/articles/PMC10606612/ /pubmed/37895542 http://dx.doi.org/10.3390/e25101421 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Guo, Qinglang Liao, Yong Li, Zhe Liang, Shenglin Multi-Modal Representation via Contrastive Learning with Attention Bottleneck Fusion and Attentive Statistics Features
title	Multi-Modal Representation via Contrastive Learning with Attention Bottleneck Fusion and Attentive Statistics Features
title_full	Multi-Modal Representation via Contrastive Learning with Attention Bottleneck Fusion and Attentive Statistics Features
title_fullStr	Multi-Modal Representation via Contrastive Learning with Attention Bottleneck Fusion and Attentive Statistics Features
title_full_unstemmed	Multi-Modal Representation via Contrastive Learning with Attention Bottleneck Fusion and Attentive Statistics Features
title_short	Multi-Modal Representation via Contrastive Learning with Attention Bottleneck Fusion and Attentive Statistics Features
title_sort	multi-modal representation via contrastive learning with attention bottleneck fusion and attentive statistics features
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10606612/ https://www.ncbi.nlm.nih.gov/pubmed/37895542 http://dx.doi.org/10.3390/e25101421
work_keys_str_mv	AT guoqinglang multimodalrepresentationviacontrastivelearningwithattentionbottleneckfusionandattentivestatisticsfeatures AT liaoyong multimodalrepresentationviacontrastivelearningwithattentionbottleneckfusionandattentivestatisticsfeatures AT lizhe multimodalrepresentationviacontrastivelearningwithattentionbottleneckfusionandattentivestatisticsfeatures AT liangshenglin multimodalrepresentationviacontrastivelearningwithattentionbottleneckfusionandattentivestatisticsfeatures

Multi-Modal Representation via Contrastive Learning with Attention Bottleneck Fusion and Attentive Statistics Features

Ejemplares similares