Cargando…

Design of GPU Network-on-Chip for Real-Time Video Super-Resolution Reconstruction

Deep learning has a better output quality compared with traditional algorithms for video super-resolution (SR), but the network model needs large resources and has poor real-time performance. This paper focuses on solving the speed problem of SR; it achieves real-time SR by the collaborative design...

Descripción completa

Detalles Bibliográficos
Autores principales: Peng, Zhiyong, Du, Jiang, Qiao, Yulong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10223162/
https://www.ncbi.nlm.nih.gov/pubmed/37241678
http://dx.doi.org/10.3390/mi14051055
_version_ 1785049875324338176
author Peng, Zhiyong
Du, Jiang
Qiao, Yulong
author_facet Peng, Zhiyong
Du, Jiang
Qiao, Yulong
author_sort Peng, Zhiyong
collection PubMed
description Deep learning has a better output quality compared with traditional algorithms for video super-resolution (SR), but the network model needs large resources and has poor real-time performance. This paper focuses on solving the speed problem of SR; it achieves real-time SR by the collaborative design of a deep learning video SR algorithm and GPU parallel acceleration. An algorithm combining deep learning networks with a lookup table (LUT) is proposed for the video SR, which ensures both the SR effect and ease of GPU parallel acceleration. The computational efficiency of the GPU network-on-chip algorithm is improved to ensure real-time performance by three major GPU optimization strategies: storage access optimization, conditional branching function optimization, and threading optimization. Finally, the network-on-chip was implemented on a RTX 3090 GPU, and the validity of the algorithm was demonstrated through ablation experiments. In addition, SR performance is compared with existing classical algorithms based on standard datasets. The new algorithm was found to be more efficient than the SR-LUT algorithm. The average PSNR was 0.61 dB higher than the SR-LUT-V algorithm and 0.24 dB higher than the SR-LUT-S algorithm. At the same time, the speed of real video SR was tested. For a real video with a resolution of  [Formula: see text] , the proposed GPU network-on-chip achieved a speed of 42 FPS. The new method is 9.1 times faster than the original SR-LUT-S fast method, which was directly imported into the GPU for processing.
format Online
Article
Text
id pubmed-10223162
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-102231622023-05-28 Design of GPU Network-on-Chip for Real-Time Video Super-Resolution Reconstruction Peng, Zhiyong Du, Jiang Qiao, Yulong Micromachines (Basel) Article Deep learning has a better output quality compared with traditional algorithms for video super-resolution (SR), but the network model needs large resources and has poor real-time performance. This paper focuses on solving the speed problem of SR; it achieves real-time SR by the collaborative design of a deep learning video SR algorithm and GPU parallel acceleration. An algorithm combining deep learning networks with a lookup table (LUT) is proposed for the video SR, which ensures both the SR effect and ease of GPU parallel acceleration. The computational efficiency of the GPU network-on-chip algorithm is improved to ensure real-time performance by three major GPU optimization strategies: storage access optimization, conditional branching function optimization, and threading optimization. Finally, the network-on-chip was implemented on a RTX 3090 GPU, and the validity of the algorithm was demonstrated through ablation experiments. In addition, SR performance is compared with existing classical algorithms based on standard datasets. The new algorithm was found to be more efficient than the SR-LUT algorithm. The average PSNR was 0.61 dB higher than the SR-LUT-V algorithm and 0.24 dB higher than the SR-LUT-S algorithm. At the same time, the speed of real video SR was tested. For a real video with a resolution of  [Formula: see text] , the proposed GPU network-on-chip achieved a speed of 42 FPS. The new method is 9.1 times faster than the original SR-LUT-S fast method, which was directly imported into the GPU for processing. MDPI 2023-05-16 /pmc/articles/PMC10223162/ /pubmed/37241678 http://dx.doi.org/10.3390/mi14051055 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Peng, Zhiyong
Du, Jiang
Qiao, Yulong
Design of GPU Network-on-Chip for Real-Time Video Super-Resolution Reconstruction
title Design of GPU Network-on-Chip for Real-Time Video Super-Resolution Reconstruction
title_full Design of GPU Network-on-Chip for Real-Time Video Super-Resolution Reconstruction
title_fullStr Design of GPU Network-on-Chip for Real-Time Video Super-Resolution Reconstruction
title_full_unstemmed Design of GPU Network-on-Chip for Real-Time Video Super-Resolution Reconstruction
title_short Design of GPU Network-on-Chip for Real-Time Video Super-Resolution Reconstruction
title_sort design of gpu network-on-chip for real-time video super-resolution reconstruction
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10223162/
https://www.ncbi.nlm.nih.gov/pubmed/37241678
http://dx.doi.org/10.3390/mi14051055
work_keys_str_mv AT pengzhiyong designofgpunetworkonchipforrealtimevideosuperresolutionreconstruction
AT dujiang designofgpunetworkonchipforrealtimevideosuperresolutionreconstruction
AT qiaoyulong designofgpunetworkonchipforrealtimevideosuperresolutionreconstruction