Cargando…

VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification

Sentiment classification is a key task in exploring people’s opinions; improved sentiment classification can help individuals make better decisions. Social media users are increasingly using both images and text to express their opinions and share their experiences, instead of only using text in con...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hou, Shangwu, Tuerhong, Gulanbaier, Wushouer, Mairidan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9862286/ https://www.ncbi.nlm.nih.gov/pubmed/36679456 http://dx.doi.org/10.3390/s23020661

_version_	1784875055423946752
author	Hou, Shangwu Tuerhong, Gulanbaier Wushouer, Mairidan
author_facet	Hou, Shangwu Tuerhong, Gulanbaier Wushouer, Mairidan
author_sort	Hou, Shangwu
collection	PubMed
description	Sentiment classification is a key task in exploring people’s opinions; improved sentiment classification can help individuals make better decisions. Social media users are increasingly using both images and text to express their opinions and share their experiences, instead of only using text in conventional social media. As a result, understanding how to fully utilize them is critical in a variety of activities, including sentiment classification. In this work, we provide a fresh multimodal sentiment classification approach: visual distillation and attention network or VisdaNet. First, this method proposes a knowledge augmentation module, which overcomes the lack of information in short text by integrating the information of image captions and short text; secondly, aimed at the information control problem in the multi-modal fusion process in the product review scene, this paper proposes a knowledge distillation based on the CLIP module to reduce the noise information of the original modalities and improve the quality of the original modal information. Finally, regarding the single-text multi-image fusion problem in the product review scene, this paper proposes visual aspect attention based on the CLIP module, which correctly models the text-image interaction relationship in special scenes and realizes feature-level fusion across modalities. The results of the experiment on the Yelp multimodal dataset reveal that our model outperforms the previous SOTA model. Furthermore, the ablation experiment results demonstrate the efficacy of various tactics in the suggested model.
format	Online Article Text
id	pubmed-9862286
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-98622862023-01-22 VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification Hou, Shangwu Tuerhong, Gulanbaier Wushouer, Mairidan Sensors (Basel) Article Sentiment classification is a key task in exploring people’s opinions; improved sentiment classification can help individuals make better decisions. Social media users are increasingly using both images and text to express their opinions and share their experiences, instead of only using text in conventional social media. As a result, understanding how to fully utilize them is critical in a variety of activities, including sentiment classification. In this work, we provide a fresh multimodal sentiment classification approach: visual distillation and attention network or VisdaNet. First, this method proposes a knowledge augmentation module, which overcomes the lack of information in short text by integrating the information of image captions and short text; secondly, aimed at the information control problem in the multi-modal fusion process in the product review scene, this paper proposes a knowledge distillation based on the CLIP module to reduce the noise information of the original modalities and improve the quality of the original modal information. Finally, regarding the single-text multi-image fusion problem in the product review scene, this paper proposes visual aspect attention based on the CLIP module, which correctly models the text-image interaction relationship in special scenes and realizes feature-level fusion across modalities. The results of the experiment on the Yelp multimodal dataset reveal that our model outperforms the previous SOTA model. Furthermore, the ablation experiment results demonstrate the efficacy of various tactics in the suggested model. MDPI 2023-01-06 /pmc/articles/PMC9862286/ /pubmed/36679456 http://dx.doi.org/10.3390/s23020661 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Hou, Shangwu Tuerhong, Gulanbaier Wushouer, Mairidan VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification
title	VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification
title_full	VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification
title_fullStr	VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification
title_full_unstemmed	VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification
title_short	VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification
title_sort	visdanet: visual distillation and attention network for multimodal sentiment classification
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9862286/ https://www.ncbi.nlm.nih.gov/pubmed/36679456 http://dx.doi.org/10.3390/s23020661
work_keys_str_mv	AT houshangwu visdanetvisualdistillationandattentionnetworkformultimodalsentimentclassification AT tuerhonggulanbaier visdanetvisualdistillationandattentionnetworkformultimodalsentimentclassification AT wushouermairidan visdanetvisualdistillationandattentionnetworkformultimodalsentimentclassification

VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification

Ejemplares similares