Cargando…

Multi-modal recommendation algorithm fusing visual and textual features

In recommender systems, the lack of interaction data between users and items tends to lead to the problem of data sparsity and cold starts. Recently, the interest modeling frameworks incorporating multi-modal features are widely used in recommendation algorithms. These algorithms use image features...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hu, Xuefeng, Yu, Wenting, Wu, Yun, Chen, Yukang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2023
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10310001/ https://www.ncbi.nlm.nih.gov/pubmed/37384736 http://dx.doi.org/10.1371/journal.pone.0287927

_version_	1785066493664296960
author	Hu, Xuefeng Yu, Wenting Wu, Yun Chen, Yukang
author_facet	Hu, Xuefeng Yu, Wenting Wu, Yun Chen, Yukang
author_sort	Hu, Xuefeng
collection	PubMed
description	In recommender systems, the lack of interaction data between users and items tends to lead to the problem of data sparsity and cold starts. Recently, the interest modeling frameworks incorporating multi-modal features are widely used in recommendation algorithms. These algorithms use image features and text features to extend the available information, which alleviate the data sparsity problem effectively, but they also have some limitations. On the one hand, multi-modal features of user interaction sequences are not considered in the interest modeling process. On the other hand, the aggregation of multi-modal features often employs simple aggregators, such as sums and concatenation, which do not distinguish the importance of different feature interactions. In this paper, to tackle this, we propose the FVTF (Fusing Visual and Textual Features) algorithm. First, we design a user history visual preference extraction module based on the Query-Key-Value attention to model users’ historical interests by using of visual features. Second, we design a feature fusion and interaction module based on the multi-head bit-wise attention to adaptively mine important feature combinations and update the higher-order attention fusion representation of features. We conduct experiments on the Movielens-1M dataset, and the experiments show that FVTF achieved the best performance compared with the benchmark recommendation algorithms.
format	Online Article Text
id	pubmed-10310001
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-103100012023-06-30 Multi-modal recommendation algorithm fusing visual and textual features Hu, Xuefeng Yu, Wenting Wu, Yun Chen, Yukang PLoS One Research Article In recommender systems, the lack of interaction data between users and items tends to lead to the problem of data sparsity and cold starts. Recently, the interest modeling frameworks incorporating multi-modal features are widely used in recommendation algorithms. These algorithms use image features and text features to extend the available information, which alleviate the data sparsity problem effectively, but they also have some limitations. On the one hand, multi-modal features of user interaction sequences are not considered in the interest modeling process. On the other hand, the aggregation of multi-modal features often employs simple aggregators, such as sums and concatenation, which do not distinguish the importance of different feature interactions. In this paper, to tackle this, we propose the FVTF (Fusing Visual and Textual Features) algorithm. First, we design a user history visual preference extraction module based on the Query-Key-Value attention to model users’ historical interests by using of visual features. Second, we design a feature fusion and interaction module based on the multi-head bit-wise attention to adaptively mine important feature combinations and update the higher-order attention fusion representation of features. We conduct experiments on the Movielens-1M dataset, and the experiments show that FVTF achieved the best performance compared with the benchmark recommendation algorithms. Public Library of Science 2023-06-29 /pmc/articles/PMC10310001/ /pubmed/37384736 http://dx.doi.org/10.1371/journal.pone.0287927 Text en © 2023 Hu et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Hu, Xuefeng Yu, Wenting Wu, Yun Chen, Yukang Multi-modal recommendation algorithm fusing visual and textual features
title	Multi-modal recommendation algorithm fusing visual and textual features
title_full	Multi-modal recommendation algorithm fusing visual and textual features
title_fullStr	Multi-modal recommendation algorithm fusing visual and textual features
title_full_unstemmed	Multi-modal recommendation algorithm fusing visual and textual features
title_short	Multi-modal recommendation algorithm fusing visual and textual features
title_sort	multi-modal recommendation algorithm fusing visual and textual features
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10310001/ https://www.ncbi.nlm.nih.gov/pubmed/37384736 http://dx.doi.org/10.1371/journal.pone.0287927
work_keys_str_mv	AT huxuefeng multimodalrecommendationalgorithmfusingvisualandtextualfeatures AT yuwenting multimodalrecommendationalgorithmfusingvisualandtextualfeatures AT wuyun multimodalrecommendationalgorithmfusingvisualandtextualfeatures AT chenyukang multimodalrecommendationalgorithmfusingvisualandtextualfeatures

Multi-modal recommendation algorithm fusing visual and textual features

Ejemplares similares