Cargando…

ReS(2)tAC—UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and CUDA Devices

With the emergence of low-cost robotic systems, such as unmanned aerial vehicle, the importance of embedded high-performance image processing has increased. For a long time, FPGAs were the only processing hardware that were capable of high-performance computing, while at the same time preserving a l...

Descripción completa

Detalles Bibliográficos
Autores principales: Ruf, Boitumelo, Mohrs, Jonas, Weinmann, Martin, Hinz, Stefan, Beyerer, Jürgen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8201159/
https://www.ncbi.nlm.nih.gov/pubmed/34200481
http://dx.doi.org/10.3390/s21113938
_version_ 1783707752559804416
author Ruf, Boitumelo
Mohrs, Jonas
Weinmann, Martin
Hinz, Stefan
Beyerer, Jürgen
author_facet Ruf, Boitumelo
Mohrs, Jonas
Weinmann, Martin
Hinz, Stefan
Beyerer, Jürgen
author_sort Ruf, Boitumelo
collection PubMed
description With the emergence of low-cost robotic systems, such as unmanned aerial vehicle, the importance of embedded high-performance image processing has increased. For a long time, FPGAs were the only processing hardware that were capable of high-performance computing, while at the same time preserving a low power consumption, essential for embedded systems. However, the recently increasing availability of embedded GPU-based systems, such as the NVIDIA Jetson series, comprised of an ARM CPU and a NVIDIA Tegra GPU, allows for massively parallel embedded computing on graphics hardware. With this in mind, we propose an approach for real-time embedded stereo processing on ARM and CUDA-enabled devices, which is based on the popular and widely used Semi-Global Matching algorithm. In this, we propose an optimization of the algorithm for embedded CUDA GPUs, by using massively parallel computing, as well as using the NEON intrinsics to optimize the algorithm for vectorized SIMD processing on embedded ARM CPUs. We have evaluated our approach with different configurations on two public stereo benchmark datasets to demonstrate that they can reach an error rate as low as 3.3%. Furthermore, our experiments show that the fastest configuration of our approach reaches up to 46 FPS on VGA image resolution. Finally, in a use-case specific qualitative evaluation, we have evaluated the power consumption of our approach and deployed it on the DJI Manifold 2-G attached to a DJI Matrix 210v2 RTK unmanned aerial vehicle (UAV), demonstrating its suitability for real-time stereo processing onboard a UAV.
format Online
Article
Text
id pubmed-8201159
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-82011592021-06-15 ReS(2)tAC—UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and CUDA Devices Ruf, Boitumelo Mohrs, Jonas Weinmann, Martin Hinz, Stefan Beyerer, Jürgen Sensors (Basel) Article With the emergence of low-cost robotic systems, such as unmanned aerial vehicle, the importance of embedded high-performance image processing has increased. For a long time, FPGAs were the only processing hardware that were capable of high-performance computing, while at the same time preserving a low power consumption, essential for embedded systems. However, the recently increasing availability of embedded GPU-based systems, such as the NVIDIA Jetson series, comprised of an ARM CPU and a NVIDIA Tegra GPU, allows for massively parallel embedded computing on graphics hardware. With this in mind, we propose an approach for real-time embedded stereo processing on ARM and CUDA-enabled devices, which is based on the popular and widely used Semi-Global Matching algorithm. In this, we propose an optimization of the algorithm for embedded CUDA GPUs, by using massively parallel computing, as well as using the NEON intrinsics to optimize the algorithm for vectorized SIMD processing on embedded ARM CPUs. We have evaluated our approach with different configurations on two public stereo benchmark datasets to demonstrate that they can reach an error rate as low as 3.3%. Furthermore, our experiments show that the fastest configuration of our approach reaches up to 46 FPS on VGA image resolution. Finally, in a use-case specific qualitative evaluation, we have evaluated the power consumption of our approach and deployed it on the DJI Manifold 2-G attached to a DJI Matrix 210v2 RTK unmanned aerial vehicle (UAV), demonstrating its suitability for real-time stereo processing onboard a UAV. MDPI 2021-06-07 /pmc/articles/PMC8201159/ /pubmed/34200481 http://dx.doi.org/10.3390/s21113938 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Ruf, Boitumelo
Mohrs, Jonas
Weinmann, Martin
Hinz, Stefan
Beyerer, Jürgen
ReS(2)tAC—UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and CUDA Devices
title ReS(2)tAC—UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and CUDA Devices
title_full ReS(2)tAC—UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and CUDA Devices
title_fullStr ReS(2)tAC—UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and CUDA Devices
title_full_unstemmed ReS(2)tAC—UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and CUDA Devices
title_short ReS(2)tAC—UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and CUDA Devices
title_sort res(2)tac—uav-borne real-time sgm stereo optimized for embedded arm and cuda devices
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8201159/
https://www.ncbi.nlm.nih.gov/pubmed/34200481
http://dx.doi.org/10.3390/s21113938
work_keys_str_mv AT rufboitumelo res2tacuavbornerealtimesgmstereooptimizedforembeddedarmandcudadevices
AT mohrsjonas res2tacuavbornerealtimesgmstereooptimizedforembeddedarmandcudadevices
AT weinmannmartin res2tacuavbornerealtimesgmstereooptimizedforembeddedarmandcudadevices
AT hinzstefan res2tacuavbornerealtimesgmstereooptimizedforembeddedarmandcudadevices
AT beyererjurgen res2tacuavbornerealtimesgmstereooptimizedforembeddedarmandcudadevices