Cargando…

Design of GPU Network-on-Chip for Real-Time Video Super-Resolution Reconstruction

Deep learning has a better output quality compared with traditional algorithms for video super-resolution (SR), but the network model needs large resources and has poor real-time performance. This paper focuses on solving the speed problem of SR; it achieves real-time SR by the collaborative design...

Descripción completa

Detalles Bibliográficos
Autores principales: Peng, Zhiyong, Du, Jiang, Qiao, Yulong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10223162/
https://www.ncbi.nlm.nih.gov/pubmed/37241678
http://dx.doi.org/10.3390/mi14051055
Descripción
Sumario:Deep learning has a better output quality compared with traditional algorithms for video super-resolution (SR), but the network model needs large resources and has poor real-time performance. This paper focuses on solving the speed problem of SR; it achieves real-time SR by the collaborative design of a deep learning video SR algorithm and GPU parallel acceleration. An algorithm combining deep learning networks with a lookup table (LUT) is proposed for the video SR, which ensures both the SR effect and ease of GPU parallel acceleration. The computational efficiency of the GPU network-on-chip algorithm is improved to ensure real-time performance by three major GPU optimization strategies: storage access optimization, conditional branching function optimization, and threading optimization. Finally, the network-on-chip was implemented on a RTX 3090 GPU, and the validity of the algorithm was demonstrated through ablation experiments. In addition, SR performance is compared with existing classical algorithms based on standard datasets. The new algorithm was found to be more efficient than the SR-LUT algorithm. The average PSNR was 0.61 dB higher than the SR-LUT-V algorithm and 0.24 dB higher than the SR-LUT-S algorithm. At the same time, the speed of real video SR was tested. For a real video with a resolution of  [Formula: see text] , the proposed GPU network-on-chip achieved a speed of 42 FPS. The new method is 9.1 times faster than the original SR-LUT-S fast method, which was directly imported into the GPU for processing.