Cargando…

Design of Neural Network Model for Cross-Media Audio and Video Score Recognition Based on Convolutional Neural Network Model

In this paper, the residual convolutional neural network is used to extract the note features in the music score image to solve the problem of model degradation; then, multiscale feature fusion is used to fuse the feature information of different levels in the same feature map to enhance the feature...

Descripción completa

Detalles Bibliográficos
Autor principal:	Liu, Hongxia
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9208963/ https://www.ncbi.nlm.nih.gov/pubmed/35733575 http://dx.doi.org/10.1155/2022/4626867

_version_	1784729829222907904
author	Liu, Hongxia
author_facet	Liu, Hongxia
author_sort	Liu, Hongxia
collection	PubMed
description	In this paper, the residual convolutional neural network is used to extract the note features in the music score image to solve the problem of model degradation; then, multiscale feature fusion is used to fuse the feature information of different levels in the same feature map to enhance the feature representation ability of the model. A network composed of a bidirectional simple loop unit and a chained time series classification function is used to identify notes, parallelizing a large number of calculations, thereby speeding up the convergence speed of training, which also makes the data in the dataset no longer need to be strict with labels. Alignment also reduces the requirements on the dataset. Aiming at the problem that the existing cross-modal retrieval methods based on common subspace are insufficient for mining local consistency within modalities, a cross-modal retrieval method fused with graph convolution is proposed. The K-nearest neighbor algorithm is used to construct modal graphs for samples of different modalities, and the original features of samples from different modalities are encoded through a symmetric graph convolutional coding network and a symmetric multilayer fully connected coding network, and the encoded features are fused and input. We jointly optimize the intramodal semantic constraints and intermodal modality-invariant constraints in the common subspace to learn highly locally consistent and semantically consistent common representations for samples from different modalities. The error value of the experimental results is used to illustrate the effect of parameters such as the number of iterations and the number of neurons on the network. In order to more accurately illustrate that the generated music sequence is very similar to the original music sequence, the generated music sequence is also framed, and finally the music sequence spectrogram and spectrogram are generated. The accuracy of the experiment is illustrated by comparing the spectrogram and the spectrogram, and genre classification predictions are also performed on the generated music to show that the network can generate music of different genres.
format	Online Article Text
id	pubmed-9208963
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Hindawi
record_format	MEDLINE/PubMed
spelling	pubmed-92089632022-06-21 Design of Neural Network Model for Cross-Media Audio and Video Score Recognition Based on Convolutional Neural Network Model Liu, Hongxia Comput Intell Neurosci Research Article In this paper, the residual convolutional neural network is used to extract the note features in the music score image to solve the problem of model degradation; then, multiscale feature fusion is used to fuse the feature information of different levels in the same feature map to enhance the feature representation ability of the model. A network composed of a bidirectional simple loop unit and a chained time series classification function is used to identify notes, parallelizing a large number of calculations, thereby speeding up the convergence speed of training, which also makes the data in the dataset no longer need to be strict with labels. Alignment also reduces the requirements on the dataset. Aiming at the problem that the existing cross-modal retrieval methods based on common subspace are insufficient for mining local consistency within modalities, a cross-modal retrieval method fused with graph convolution is proposed. The K-nearest neighbor algorithm is used to construct modal graphs for samples of different modalities, and the original features of samples from different modalities are encoded through a symmetric graph convolutional coding network and a symmetric multilayer fully connected coding network, and the encoded features are fused and input. We jointly optimize the intramodal semantic constraints and intermodal modality-invariant constraints in the common subspace to learn highly locally consistent and semantically consistent common representations for samples from different modalities. The error value of the experimental results is used to illustrate the effect of parameters such as the number of iterations and the number of neurons on the network. In order to more accurately illustrate that the generated music sequence is very similar to the original music sequence, the generated music sequence is also framed, and finally the music sequence spectrogram and spectrogram are generated. The accuracy of the experiment is illustrated by comparing the spectrogram and the spectrogram, and genre classification predictions are also performed on the generated music to show that the network can generate music of different genres. Hindawi 2022-06-13 /pmc/articles/PMC9208963/ /pubmed/35733575 http://dx.doi.org/10.1155/2022/4626867 Text en Copyright © 2022 Hongxia Liu. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Liu, Hongxia Design of Neural Network Model for Cross-Media Audio and Video Score Recognition Based on Convolutional Neural Network Model
title	Design of Neural Network Model for Cross-Media Audio and Video Score Recognition Based on Convolutional Neural Network Model
title_full	Design of Neural Network Model for Cross-Media Audio and Video Score Recognition Based on Convolutional Neural Network Model
title_fullStr	Design of Neural Network Model for Cross-Media Audio and Video Score Recognition Based on Convolutional Neural Network Model
title_full_unstemmed	Design of Neural Network Model for Cross-Media Audio and Video Score Recognition Based on Convolutional Neural Network Model
title_short	Design of Neural Network Model for Cross-Media Audio and Video Score Recognition Based on Convolutional Neural Network Model
title_sort	design of neural network model for cross-media audio and video score recognition based on convolutional neural network model
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9208963/ https://www.ncbi.nlm.nih.gov/pubmed/35733575 http://dx.doi.org/10.1155/2022/4626867
work_keys_str_mv	AT liuhongxia designofneuralnetworkmodelforcrossmediaaudioandvideoscorerecognitionbasedonconvolutionalneuralnetworkmodel

Design of Neural Network Model for Cross-Media Audio and Video Score Recognition Based on Convolutional Neural Network Model

Ejemplares similares