Cargando…

A Dual-Path Cross-Modal Network for Video-Music Retrieval

In recent years, with the development of the internet, video has become more and more widely used in life. Adding harmonious music to a video is gradually becoming an artistic task. However, artificially adding music takes a lot of time and effort, so we propose a method to recommend background musi...

Descripción completa

Detalles Bibliográficos
Autores principales: Gu, Xin, Shen, Yinghua, Lv, Chaohui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9861296/
https://www.ncbi.nlm.nih.gov/pubmed/36679605
http://dx.doi.org/10.3390/s23020805
_version_ 1784874805832450048
author Gu, Xin
Shen, Yinghua
Lv, Chaohui
author_facet Gu, Xin
Shen, Yinghua
Lv, Chaohui
author_sort Gu, Xin
collection PubMed
description In recent years, with the development of the internet, video has become more and more widely used in life. Adding harmonious music to a video is gradually becoming an artistic task. However, artificially adding music takes a lot of time and effort, so we propose a method to recommend background music for videos. The emotional message of music is rarely taken into account in current work, but it is crucial for video music retrieval. To achieve this, we design two paths to process content information and emotional information between modals. Based on the characteristics of video and music, we design various feature extraction schemes and common representation spaces. In the content path, the pre-trained network is used as the feature extraction network. As these features contain some redundant information, we use an encoder–decoder structure for dimensionality reduction. Where encoder weights are shared to obtain content sharing features for video and music. In the emotion path, an emotion key frames scheme was used for video and a channel attention mechanism was used for music in order to obtain the emotion information effectively. We also added emotion distinguish loss to guarantee that the network acquires the emotion information effectively. More importantly, we propose a way to combine content information with emotional information. That is, content features are first stitched together with sentiment features and then passed through a fused shared space structured as an MLP to obtain more effective fused shared features. In addition, a polarity penalty factor has been added to the classical metric loss function to make it more suitable for this task. Experiments show that this dual path video music retrieval network can effectively merge information. Compared with existing methods, the retrieval task evaluation index increases Recall@1 by 3.94.
format Online
Article
Text
id pubmed-9861296
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-98612962023-01-22 A Dual-Path Cross-Modal Network for Video-Music Retrieval Gu, Xin Shen, Yinghua Lv, Chaohui Sensors (Basel) Article In recent years, with the development of the internet, video has become more and more widely used in life. Adding harmonious music to a video is gradually becoming an artistic task. However, artificially adding music takes a lot of time and effort, so we propose a method to recommend background music for videos. The emotional message of music is rarely taken into account in current work, but it is crucial for video music retrieval. To achieve this, we design two paths to process content information and emotional information between modals. Based on the characteristics of video and music, we design various feature extraction schemes and common representation spaces. In the content path, the pre-trained network is used as the feature extraction network. As these features contain some redundant information, we use an encoder–decoder structure for dimensionality reduction. Where encoder weights are shared to obtain content sharing features for video and music. In the emotion path, an emotion key frames scheme was used for video and a channel attention mechanism was used for music in order to obtain the emotion information effectively. We also added emotion distinguish loss to guarantee that the network acquires the emotion information effectively. More importantly, we propose a way to combine content information with emotional information. That is, content features are first stitched together with sentiment features and then passed through a fused shared space structured as an MLP to obtain more effective fused shared features. In addition, a polarity penalty factor has been added to the classical metric loss function to make it more suitable for this task. Experiments show that this dual path video music retrieval network can effectively merge information. Compared with existing methods, the retrieval task evaluation index increases Recall@1 by 3.94. MDPI 2023-01-10 /pmc/articles/PMC9861296/ /pubmed/36679605 http://dx.doi.org/10.3390/s23020805 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Gu, Xin
Shen, Yinghua
Lv, Chaohui
A Dual-Path Cross-Modal Network for Video-Music Retrieval
title A Dual-Path Cross-Modal Network for Video-Music Retrieval
title_full A Dual-Path Cross-Modal Network for Video-Music Retrieval
title_fullStr A Dual-Path Cross-Modal Network for Video-Music Retrieval
title_full_unstemmed A Dual-Path Cross-Modal Network for Video-Music Retrieval
title_short A Dual-Path Cross-Modal Network for Video-Music Retrieval
title_sort dual-path cross-modal network for video-music retrieval
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9861296/
https://www.ncbi.nlm.nih.gov/pubmed/36679605
http://dx.doi.org/10.3390/s23020805
work_keys_str_mv AT guxin adualpathcrossmodalnetworkforvideomusicretrieval
AT shenyinghua adualpathcrossmodalnetworkforvideomusicretrieval
AT lvchaohui adualpathcrossmodalnetworkforvideomusicretrieval
AT guxin dualpathcrossmodalnetworkforvideomusicretrieval
AT shenyinghua dualpathcrossmodalnetworkforvideomusicretrieval
AT lvchaohui dualpathcrossmodalnetworkforvideomusicretrieval