Cargando…
Design of GPU Network-on-Chip for Real-Time Video Super-Resolution Reconstruction
Deep learning has a better output quality compared with traditional algorithms for video super-resolution (SR), but the network model needs large resources and has poor real-time performance. This paper focuses on solving the speed problem of SR; it achieves real-time SR by the collaborative design...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10223162/ https://www.ncbi.nlm.nih.gov/pubmed/37241678 http://dx.doi.org/10.3390/mi14051055 |
_version_ | 1785049875324338176 |
---|---|
author | Peng, Zhiyong Du, Jiang Qiao, Yulong |
author_facet | Peng, Zhiyong Du, Jiang Qiao, Yulong |
author_sort | Peng, Zhiyong |
collection | PubMed |
description | Deep learning has a better output quality compared with traditional algorithms for video super-resolution (SR), but the network model needs large resources and has poor real-time performance. This paper focuses on solving the speed problem of SR; it achieves real-time SR by the collaborative design of a deep learning video SR algorithm and GPU parallel acceleration. An algorithm combining deep learning networks with a lookup table (LUT) is proposed for the video SR, which ensures both the SR effect and ease of GPU parallel acceleration. The computational efficiency of the GPU network-on-chip algorithm is improved to ensure real-time performance by three major GPU optimization strategies: storage access optimization, conditional branching function optimization, and threading optimization. Finally, the network-on-chip was implemented on a RTX 3090 GPU, and the validity of the algorithm was demonstrated through ablation experiments. In addition, SR performance is compared with existing classical algorithms based on standard datasets. The new algorithm was found to be more efficient than the SR-LUT algorithm. The average PSNR was 0.61 dB higher than the SR-LUT-V algorithm and 0.24 dB higher than the SR-LUT-S algorithm. At the same time, the speed of real video SR was tested. For a real video with a resolution of [Formula: see text] , the proposed GPU network-on-chip achieved a speed of 42 FPS. The new method is 9.1 times faster than the original SR-LUT-S fast method, which was directly imported into the GPU for processing. |
format | Online Article Text |
id | pubmed-10223162 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-102231622023-05-28 Design of GPU Network-on-Chip for Real-Time Video Super-Resolution Reconstruction Peng, Zhiyong Du, Jiang Qiao, Yulong Micromachines (Basel) Article Deep learning has a better output quality compared with traditional algorithms for video super-resolution (SR), but the network model needs large resources and has poor real-time performance. This paper focuses on solving the speed problem of SR; it achieves real-time SR by the collaborative design of a deep learning video SR algorithm and GPU parallel acceleration. An algorithm combining deep learning networks with a lookup table (LUT) is proposed for the video SR, which ensures both the SR effect and ease of GPU parallel acceleration. The computational efficiency of the GPU network-on-chip algorithm is improved to ensure real-time performance by three major GPU optimization strategies: storage access optimization, conditional branching function optimization, and threading optimization. Finally, the network-on-chip was implemented on a RTX 3090 GPU, and the validity of the algorithm was demonstrated through ablation experiments. In addition, SR performance is compared with existing classical algorithms based on standard datasets. The new algorithm was found to be more efficient than the SR-LUT algorithm. The average PSNR was 0.61 dB higher than the SR-LUT-V algorithm and 0.24 dB higher than the SR-LUT-S algorithm. At the same time, the speed of real video SR was tested. For a real video with a resolution of [Formula: see text] , the proposed GPU network-on-chip achieved a speed of 42 FPS. The new method is 9.1 times faster than the original SR-LUT-S fast method, which was directly imported into the GPU for processing. MDPI 2023-05-16 /pmc/articles/PMC10223162/ /pubmed/37241678 http://dx.doi.org/10.3390/mi14051055 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Peng, Zhiyong Du, Jiang Qiao, Yulong Design of GPU Network-on-Chip for Real-Time Video Super-Resolution Reconstruction |
title | Design of GPU Network-on-Chip for Real-Time Video Super-Resolution Reconstruction |
title_full | Design of GPU Network-on-Chip for Real-Time Video Super-Resolution Reconstruction |
title_fullStr | Design of GPU Network-on-Chip for Real-Time Video Super-Resolution Reconstruction |
title_full_unstemmed | Design of GPU Network-on-Chip for Real-Time Video Super-Resolution Reconstruction |
title_short | Design of GPU Network-on-Chip for Real-Time Video Super-Resolution Reconstruction |
title_sort | design of gpu network-on-chip for real-time video super-resolution reconstruction |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10223162/ https://www.ncbi.nlm.nih.gov/pubmed/37241678 http://dx.doi.org/10.3390/mi14051055 |
work_keys_str_mv | AT pengzhiyong designofgpunetworkonchipforrealtimevideosuperresolutionreconstruction AT dujiang designofgpunetworkonchipforrealtimevideosuperresolutionreconstruction AT qiaoyulong designofgpunetworkonchipforrealtimevideosuperresolutionreconstruction |