Cargando…

VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification

Sentiment classification is a key task in exploring people’s opinions; improved sentiment classification can help individuals make better decisions. Social media users are increasingly using both images and text to express their opinions and share their experiences, instead of only using text in con...

Descripción completa

Detalles Bibliográficos
Autores principales: Hou, Shangwu, Tuerhong, Gulanbaier, Wushouer, Mairidan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9862286/
https://www.ncbi.nlm.nih.gov/pubmed/36679456
http://dx.doi.org/10.3390/s23020661
_version_ 1784875055423946752
author Hou, Shangwu
Tuerhong, Gulanbaier
Wushouer, Mairidan
author_facet Hou, Shangwu
Tuerhong, Gulanbaier
Wushouer, Mairidan
author_sort Hou, Shangwu
collection PubMed
description Sentiment classification is a key task in exploring people’s opinions; improved sentiment classification can help individuals make better decisions. Social media users are increasingly using both images and text to express their opinions and share their experiences, instead of only using text in conventional social media. As a result, understanding how to fully utilize them is critical in a variety of activities, including sentiment classification. In this work, we provide a fresh multimodal sentiment classification approach: visual distillation and attention network or VisdaNet. First, this method proposes a knowledge augmentation module, which overcomes the lack of information in short text by integrating the information of image captions and short text; secondly, aimed at the information control problem in the multi-modal fusion process in the product review scene, this paper proposes a knowledge distillation based on the CLIP module to reduce the noise information of the original modalities and improve the quality of the original modal information. Finally, regarding the single-text multi-image fusion problem in the product review scene, this paper proposes visual aspect attention based on the CLIP module, which correctly models the text-image interaction relationship in special scenes and realizes feature-level fusion across modalities. The results of the experiment on the Yelp multimodal dataset reveal that our model outperforms the previous SOTA model. Furthermore, the ablation experiment results demonstrate the efficacy of various tactics in the suggested model.
format Online
Article
Text
id pubmed-9862286
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-98622862023-01-22 VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification Hou, Shangwu Tuerhong, Gulanbaier Wushouer, Mairidan Sensors (Basel) Article Sentiment classification is a key task in exploring people’s opinions; improved sentiment classification can help individuals make better decisions. Social media users are increasingly using both images and text to express their opinions and share their experiences, instead of only using text in conventional social media. As a result, understanding how to fully utilize them is critical in a variety of activities, including sentiment classification. In this work, we provide a fresh multimodal sentiment classification approach: visual distillation and attention network or VisdaNet. First, this method proposes a knowledge augmentation module, which overcomes the lack of information in short text by integrating the information of image captions and short text; secondly, aimed at the information control problem in the multi-modal fusion process in the product review scene, this paper proposes a knowledge distillation based on the CLIP module to reduce the noise information of the original modalities and improve the quality of the original modal information. Finally, regarding the single-text multi-image fusion problem in the product review scene, this paper proposes visual aspect attention based on the CLIP module, which correctly models the text-image interaction relationship in special scenes and realizes feature-level fusion across modalities. The results of the experiment on the Yelp multimodal dataset reveal that our model outperforms the previous SOTA model. Furthermore, the ablation experiment results demonstrate the efficacy of various tactics in the suggested model. MDPI 2023-01-06 /pmc/articles/PMC9862286/ /pubmed/36679456 http://dx.doi.org/10.3390/s23020661 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Hou, Shangwu
Tuerhong, Gulanbaier
Wushouer, Mairidan
VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification
title VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification
title_full VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification
title_fullStr VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification
title_full_unstemmed VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification
title_short VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification
title_sort visdanet: visual distillation and attention network for multimodal sentiment classification
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9862286/
https://www.ncbi.nlm.nih.gov/pubmed/36679456
http://dx.doi.org/10.3390/s23020661
work_keys_str_mv AT houshangwu visdanetvisualdistillationandattentionnetworkformultimodalsentimentclassification
AT tuerhonggulanbaier visdanetvisualdistillationandattentionnetworkformultimodalsentimentclassification
AT wushouermairidan visdanetvisualdistillationandattentionnetworkformultimodalsentimentclassification