Cargando…
VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification
Sentiment classification is a key task in exploring people’s opinions; improved sentiment classification can help individuals make better decisions. Social media users are increasingly using both images and text to express their opinions and share their experiences, instead of only using text in con...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9862286/ https://www.ncbi.nlm.nih.gov/pubmed/36679456 http://dx.doi.org/10.3390/s23020661 |
_version_ | 1784875055423946752 |
---|---|
author | Hou, Shangwu Tuerhong, Gulanbaier Wushouer, Mairidan |
author_facet | Hou, Shangwu Tuerhong, Gulanbaier Wushouer, Mairidan |
author_sort | Hou, Shangwu |
collection | PubMed |
description | Sentiment classification is a key task in exploring people’s opinions; improved sentiment classification can help individuals make better decisions. Social media users are increasingly using both images and text to express their opinions and share their experiences, instead of only using text in conventional social media. As a result, understanding how to fully utilize them is critical in a variety of activities, including sentiment classification. In this work, we provide a fresh multimodal sentiment classification approach: visual distillation and attention network or VisdaNet. First, this method proposes a knowledge augmentation module, which overcomes the lack of information in short text by integrating the information of image captions and short text; secondly, aimed at the information control problem in the multi-modal fusion process in the product review scene, this paper proposes a knowledge distillation based on the CLIP module to reduce the noise information of the original modalities and improve the quality of the original modal information. Finally, regarding the single-text multi-image fusion problem in the product review scene, this paper proposes visual aspect attention based on the CLIP module, which correctly models the text-image interaction relationship in special scenes and realizes feature-level fusion across modalities. The results of the experiment on the Yelp multimodal dataset reveal that our model outperforms the previous SOTA model. Furthermore, the ablation experiment results demonstrate the efficacy of various tactics in the suggested model. |
format | Online Article Text |
id | pubmed-9862286 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-98622862023-01-22 VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification Hou, Shangwu Tuerhong, Gulanbaier Wushouer, Mairidan Sensors (Basel) Article Sentiment classification is a key task in exploring people’s opinions; improved sentiment classification can help individuals make better decisions. Social media users are increasingly using both images and text to express their opinions and share their experiences, instead of only using text in conventional social media. As a result, understanding how to fully utilize them is critical in a variety of activities, including sentiment classification. In this work, we provide a fresh multimodal sentiment classification approach: visual distillation and attention network or VisdaNet. First, this method proposes a knowledge augmentation module, which overcomes the lack of information in short text by integrating the information of image captions and short text; secondly, aimed at the information control problem in the multi-modal fusion process in the product review scene, this paper proposes a knowledge distillation based on the CLIP module to reduce the noise information of the original modalities and improve the quality of the original modal information. Finally, regarding the single-text multi-image fusion problem in the product review scene, this paper proposes visual aspect attention based on the CLIP module, which correctly models the text-image interaction relationship in special scenes and realizes feature-level fusion across modalities. The results of the experiment on the Yelp multimodal dataset reveal that our model outperforms the previous SOTA model. Furthermore, the ablation experiment results demonstrate the efficacy of various tactics in the suggested model. MDPI 2023-01-06 /pmc/articles/PMC9862286/ /pubmed/36679456 http://dx.doi.org/10.3390/s23020661 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Hou, Shangwu Tuerhong, Gulanbaier Wushouer, Mairidan VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification |
title | VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification |
title_full | VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification |
title_fullStr | VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification |
title_full_unstemmed | VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification |
title_short | VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification |
title_sort | visdanet: visual distillation and attention network for multimodal sentiment classification |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9862286/ https://www.ncbi.nlm.nih.gov/pubmed/36679456 http://dx.doi.org/10.3390/s23020661 |
work_keys_str_mv | AT houshangwu visdanetvisualdistillationandattentionnetworkformultimodalsentimentclassification AT tuerhonggulanbaier visdanetvisualdistillationandattentionnetworkformultimodalsentimentclassification AT wushouermairidan visdanetvisualdistillationandattentionnetworkformultimodalsentimentclassification |