Cargando…

Tensor-Based CUDA Optimization for ANN Inferencing Using Parallel Acceleration on Embedded GPU

With image processing, robots acquired visual perception skills; enabling them to become autonomous. Since the emergence of Artificial Intelligence (AI), sophisticated tasks such as object identification have become possible through inferencing Artificial Neural Networks (ANN). Be that as it may, Au...

Descripción completa

Detalles Bibliográficos
Autores principales:	Al Ghadani, Ahmed Khamis Abdullah, Mateen, Waleeja, Ramaswamy, Rameshkumar G.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7256376/ http://dx.doi.org/10.1007/978-3-030-49161-1_25

_version_	1783539894237265920
author	Al Ghadani, Ahmed Khamis Abdullah Mateen, Waleeja Ramaswamy, Rameshkumar G.
author_facet	Al Ghadani, Ahmed Khamis Abdullah Mateen, Waleeja Ramaswamy, Rameshkumar G.
author_sort	Al Ghadani, Ahmed Khamis Abdullah
collection	PubMed
description	With image processing, robots acquired visual perception skills; enabling them to become autonomous. Since the emergence of Artificial Intelligence (AI), sophisticated tasks such as object identification have become possible through inferencing Artificial Neural Networks (ANN). Be that as it may, Autonomous Mobile Robots (AMR) are Embedded Systems (ESs) with limited on-board resources. Thus, efficient techniques in ANN inferencing are required for real-time performance. This paper presents the process of optimizing ANNs inferencing using tensor-based optimization on embedded Graphical Processing Unit (GPU) with Computer Unified Device Architecture (CUDA) platform for parallel acceleration on ES. This research evaluates renowned network, namely, You-Only-Look-Once (YOLO), on NVIDIA Jetson TX2 System-On-Module (SOM). The findings of this paper display a significant improvement in inferencing speed in terms of Frames-Per-Second (FPS) up to 3.5 times the non-optimized inferencing speed. Furthermore, the current CUDA model and TensorRT optimization techniques are studied, comments are made on its implementation for inferencing, and improvements are proposed based on the results acquired. These findings will contribute to ES developers and industries will benefit from real-time performance inferencing for AMR automation solutions.
format	Online Article Text
id	pubmed-7256376
institution	National Center for Biotechnology Information
language	English
publishDate	2020
record_format	MEDLINE/PubMed
spelling	pubmed-72563762020-05-29 Tensor-Based CUDA Optimization for ANN Inferencing Using Parallel Acceleration on Embedded GPU Al Ghadani, Ahmed Khamis Abdullah Mateen, Waleeja Ramaswamy, Rameshkumar G. Artificial Intelligence Applications and Innovations Article With image processing, robots acquired visual perception skills; enabling them to become autonomous. Since the emergence of Artificial Intelligence (AI), sophisticated tasks such as object identification have become possible through inferencing Artificial Neural Networks (ANN). Be that as it may, Autonomous Mobile Robots (AMR) are Embedded Systems (ESs) with limited on-board resources. Thus, efficient techniques in ANN inferencing are required for real-time performance. This paper presents the process of optimizing ANNs inferencing using tensor-based optimization on embedded Graphical Processing Unit (GPU) with Computer Unified Device Architecture (CUDA) platform for parallel acceleration on ES. This research evaluates renowned network, namely, You-Only-Look-Once (YOLO), on NVIDIA Jetson TX2 System-On-Module (SOM). The findings of this paper display a significant improvement in inferencing speed in terms of Frames-Per-Second (FPS) up to 3.5 times the non-optimized inferencing speed. Furthermore, the current CUDA model and TensorRT optimization techniques are studied, comments are made on its implementation for inferencing, and improvements are proposed based on the results acquired. These findings will contribute to ES developers and industries will benefit from real-time performance inferencing for AMR automation solutions. 2020-05-06 /pmc/articles/PMC7256376/ http://dx.doi.org/10.1007/978-3-030-49161-1_25 Text en © IFIP International Federation for Information Processing 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Article Al Ghadani, Ahmed Khamis Abdullah Mateen, Waleeja Ramaswamy, Rameshkumar G. Tensor-Based CUDA Optimization for ANN Inferencing Using Parallel Acceleration on Embedded GPU
title	Tensor-Based CUDA Optimization for ANN Inferencing Using Parallel Acceleration on Embedded GPU
title_full	Tensor-Based CUDA Optimization for ANN Inferencing Using Parallel Acceleration on Embedded GPU
title_fullStr	Tensor-Based CUDA Optimization for ANN Inferencing Using Parallel Acceleration on Embedded GPU
title_full_unstemmed	Tensor-Based CUDA Optimization for ANN Inferencing Using Parallel Acceleration on Embedded GPU
title_short	Tensor-Based CUDA Optimization for ANN Inferencing Using Parallel Acceleration on Embedded GPU
title_sort	tensor-based cuda optimization for ann inferencing using parallel acceleration on embedded gpu
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7256376/ http://dx.doi.org/10.1007/978-3-030-49161-1_25
work_keys_str_mv	AT alghadaniahmedkhamisabdullah tensorbasedcudaoptimizationforanninferencingusingparallelaccelerationonembeddedgpu AT mateenwaleeja tensorbasedcudaoptimizationforanninferencingusingparallelaccelerationonembeddedgpu AT ramaswamyrameshkumarg tensorbasedcudaoptimizationforanninferencingusingparallelaccelerationonembeddedgpu

Tensor-Based CUDA Optimization for ANN Inferencing Using Parallel Acceleration on Embedded GPU

Ejemplares similares