Cargando…

A Dual-Path Cross-Modal Network for Video-Music Retrieval

In recent years, with the development of the internet, video has become more and more widely used in life. Adding harmonious music to a video is gradually becoming an artistic task. However, artificially adding music takes a lot of time and effort, so we propose a method to recommend background musi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gu, Xin, Shen, Yinghua, Lv, Chaohui
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9861296/ https://www.ncbi.nlm.nih.gov/pubmed/36679605 http://dx.doi.org/10.3390/s23020805

_version_	1784874805832450048
author	Gu, Xin Shen, Yinghua Lv, Chaohui
author_facet	Gu, Xin Shen, Yinghua Lv, Chaohui
author_sort	Gu, Xin
collection	PubMed
description	In recent years, with the development of the internet, video has become more and more widely used in life. Adding harmonious music to a video is gradually becoming an artistic task. However, artificially adding music takes a lot of time and effort, so we propose a method to recommend background music for videos. The emotional message of music is rarely taken into account in current work, but it is crucial for video music retrieval. To achieve this, we design two paths to process content information and emotional information between modals. Based on the characteristics of video and music, we design various feature extraction schemes and common representation spaces. In the content path, the pre-trained network is used as the feature extraction network. As these features contain some redundant information, we use an encoder–decoder structure for dimensionality reduction. Where encoder weights are shared to obtain content sharing features for video and music. In the emotion path, an emotion key frames scheme was used for video and a channel attention mechanism was used for music in order to obtain the emotion information effectively. We also added emotion distinguish loss to guarantee that the network acquires the emotion information effectively. More importantly, we propose a way to combine content information with emotional information. That is, content features are first stitched together with sentiment features and then passed through a fused shared space structured as an MLP to obtain more effective fused shared features. In addition, a polarity penalty factor has been added to the classical metric loss function to make it more suitable for this task. Experiments show that this dual path video music retrieval network can effectively merge information. Compared with existing methods, the retrieval task evaluation index increases Recall@1 by 3.94.
format	Online Article Text
id	pubmed-9861296
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-98612962023-01-22 A Dual-Path Cross-Modal Network for Video-Music Retrieval Gu, Xin Shen, Yinghua Lv, Chaohui Sensors (Basel) Article In recent years, with the development of the internet, video has become more and more widely used in life. Adding harmonious music to a video is gradually becoming an artistic task. However, artificially adding music takes a lot of time and effort, so we propose a method to recommend background music for videos. The emotional message of music is rarely taken into account in current work, but it is crucial for video music retrieval. To achieve this, we design two paths to process content information and emotional information between modals. Based on the characteristics of video and music, we design various feature extraction schemes and common representation spaces. In the content path, the pre-trained network is used as the feature extraction network. As these features contain some redundant information, we use an encoder–decoder structure for dimensionality reduction. Where encoder weights are shared to obtain content sharing features for video and music. In the emotion path, an emotion key frames scheme was used for video and a channel attention mechanism was used for music in order to obtain the emotion information effectively. We also added emotion distinguish loss to guarantee that the network acquires the emotion information effectively. More importantly, we propose a way to combine content information with emotional information. That is, content features are first stitched together with sentiment features and then passed through a fused shared space structured as an MLP to obtain more effective fused shared features. In addition, a polarity penalty factor has been added to the classical metric loss function to make it more suitable for this task. Experiments show that this dual path video music retrieval network can effectively merge information. Compared with existing methods, the retrieval task evaluation index increases Recall@1 by 3.94. MDPI 2023-01-10 /pmc/articles/PMC9861296/ /pubmed/36679605 http://dx.doi.org/10.3390/s23020805 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Gu, Xin Shen, Yinghua Lv, Chaohui A Dual-Path Cross-Modal Network for Video-Music Retrieval
title	A Dual-Path Cross-Modal Network for Video-Music Retrieval
title_full	A Dual-Path Cross-Modal Network for Video-Music Retrieval
title_fullStr	A Dual-Path Cross-Modal Network for Video-Music Retrieval
title_full_unstemmed	A Dual-Path Cross-Modal Network for Video-Music Retrieval
title_short	A Dual-Path Cross-Modal Network for Video-Music Retrieval
title_sort	dual-path cross-modal network for video-music retrieval
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9861296/ https://www.ncbi.nlm.nih.gov/pubmed/36679605 http://dx.doi.org/10.3390/s23020805
work_keys_str_mv	AT guxin adualpathcrossmodalnetworkforvideomusicretrieval AT shenyinghua adualpathcrossmodalnetworkforvideomusicretrieval AT lvchaohui adualpathcrossmodalnetworkforvideomusicretrieval AT guxin dualpathcrossmodalnetworkforvideomusicretrieval AT shenyinghua dualpathcrossmodalnetworkforvideomusicretrieval AT lvchaohui dualpathcrossmodalnetworkforvideomusicretrieval

A Dual-Path Cross-Modal Network for Video-Music Retrieval

Ejemplares similares