Cargando…

Performance and Scalability Analysis of CNN-based Deep Learning Inference in the Intel Distribution of OpenVINO Toolkit

Deep learning is widely used in many problem areas, namely computer vision, natural language processing, bioinformatics, biomedicine, and others. Training neural networks involves searching the optimal weights of the model. It is a computationally intensive procedure, usually perfor...

Descripción completa

Detalles Bibliográficos
Autor principal:	Meyerov, Iosif
Lenguaje:	eng
Publicado:	2019
Materias:	other events or meetings
Acceso en línea:	http://cds.cern.ch/record/2692157

_version_	1780963929101434880
author	Meyerov, Iosif
author_facet	Meyerov, Iosif
author_sort	Meyerov, Iosif
collection	CERN
description	<!--HTML-->Deep learning is widely used in many problem areas, namely computer vision, natural language processing, bioinformatics, biomedicine, and others. Training neural networks involves searching the optimal weights of the model. It is a computationally intensive procedure, usually performed a limited number of times offline on servers equipped with powerful graphics cards. Inference of deep models implies forward propagation of a neural network. This repeated procedure should be executed as fast as possible on available computational devices (CPUs, embedded devices). A large number of deep models are convolutional, so increasing the performance of convolutional neural networks (CNNs) on Intel CPUs is a practically important task. The Intel Distribution of OpenVINO toolkit includes components that support the development of real-time visual applications. For the efficient CNN inference execution on Intel platforms (Intel CPUs, Intel Processor Graphics, Intel FPGAs, Intel VPUs), the OpenVINO developers provide the Deep Learning Deployment Toolkit (DLDT). It contains tools for platform independent optimizations of network topologies as well as low-level inference optimizations. In this talk we analyze performance and scalability of several toolkits that provide high-performance CNN-based deep learning inference on Intel platforms. In this regard, we consider two typical data science problems: Image classification (Model: ResNet-50, Dataset: ImageNET) and Object detection (Model: SSD300, Dataset: PASCAL VOC 2012). First, we prepare a set of trained models for the following toolkits: Intel Distribution of OpenVINO toolkit, Intel Caffe, Caffe, and TensorFlow. Then, a sufficiently large set of images is selected from each dataset so that the performance analysis gives accurate results. For each toolkit built using the optimizing Intel compiler, the most appropriate parameters (the batch size, the number of CPU cores used) are experimentally determined. Further, computational experiments are carried out on the Intel Endeavor supercomputer using high-end Skylake and CascadeLake CPUs. The main contributions of this talk are as follows: 1. Comparison of performance of the Intel Distribution of OpenVINO toolkit and other similar software for CNN-based deep learning inference on Intel platforms. 2. Analysis of scaling efficiency of the OpenVINO toolkit using dozens of CPU cores in a throughput mode. 3. Exploring the results of Intel AVX512 VNNI performance acceleration in Intel CascadeLake CPUs. 4. Analysis of modern CPUs utilization in CNN-based deep learning inference using the Roofline model by means of Intel Advisor.
id	cern-2692157
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2019
record_format	invenio
spelling	cern-26921572022-11-02T22:24:39Zhttp://cds.cern.ch/record/2692157engMeyerov, IosifPerformance and Scalability Analysis of CNN-based Deep Learning Inference in the Intel Distribution of OpenVINO ToolkitIXPUG 2019 Annual Conference at CERNother events or meetings<!--HTML-->Deep learning is widely used in many problem areas, namely computer vision, natural language processing, bioinformatics, biomedicine, and others. Training neural networks involves searching the optimal weights of the model. It is a computationally intensive procedure, usually performed a limited number of times offline on servers equipped with powerful graphics cards. Inference of deep models implies forward propagation of a neural network. This repeated procedure should be executed as fast as possible on available computational devices (CPUs, embedded devices). A large number of deep models are convolutional, so increasing the performance of convolutional neural networks (CNNs) on Intel CPUs is a practically important task. The Intel Distribution of OpenVINO toolkit includes components that support the development of real-time visual applications. For the efficient CNN inference execution on Intel platforms (Intel CPUs, Intel Processor Graphics, Intel FPGAs, Intel VPUs), the OpenVINO developers provide the Deep Learning Deployment Toolkit (DLDT). It contains tools for platform independent optimizations of network topologies as well as low-level inference optimizations. In this talk we analyze performance and scalability of several toolkits that provide high-performance CNN-based deep learning inference on Intel platforms. In this regard, we consider two typical data science problems: Image classification (Model: ResNet-50, Dataset: ImageNET) and Object detection (Model: SSD300, Dataset: PASCAL VOC 2012). First, we prepare a set of trained models for the following toolkits: Intel Distribution of OpenVINO toolkit, Intel Caffe, Caffe, and TensorFlow. Then, a sufficiently large set of images is selected from each dataset so that the performance analysis gives accurate results. For each toolkit built using the optimizing Intel compiler, the most appropriate parameters (the batch size, the number of CPU cores used) are experimentally determined. Further, computational experiments are carried out on the Intel Endeavor supercomputer using high-end Skylake and CascadeLake CPUs. The main contributions of this talk are as follows: 1. Comparison of performance of the Intel Distribution of OpenVINO toolkit and other similar software for CNN-based deep learning inference on Intel platforms. 2. Analysis of scaling efficiency of the OpenVINO toolkit using dozens of CPU cores in a throughput mode. 3. Exploring the results of Intel AVX512 VNNI performance acceleration in Intel CascadeLake CPUs. 4. Analysis of modern CPUs utilization in CNN-based deep learning inference using the Roofline model by means of Intel Advisor.oai:cds.cern.ch:26921572019
spellingShingle	other events or meetings Meyerov, Iosif Performance and Scalability Analysis of CNN-based Deep Learning Inference in the Intel Distribution of OpenVINO Toolkit
title	Performance and Scalability Analysis of CNN-based Deep Learning Inference in the Intel Distribution of OpenVINO Toolkit
title_full	Performance and Scalability Analysis of CNN-based Deep Learning Inference in the Intel Distribution of OpenVINO Toolkit
title_fullStr	Performance and Scalability Analysis of CNN-based Deep Learning Inference in the Intel Distribution of OpenVINO Toolkit
title_full_unstemmed	Performance and Scalability Analysis of CNN-based Deep Learning Inference in the Intel Distribution of OpenVINO Toolkit
title_short	Performance and Scalability Analysis of CNN-based Deep Learning Inference in the Intel Distribution of OpenVINO Toolkit
title_sort	performance and scalability analysis of cnn-based deep learning inference in the intel distribution of openvino toolkit
topic	other events or meetings
url	http://cds.cern.ch/record/2692157
work_keys_str_mv	AT meyeroviosif performanceandscalabilityanalysisofcnnbaseddeeplearninginferenceintheinteldistributionofopenvinotoolkit AT meyeroviosif ixpug2019annualconferenceatcern

Performance and Scalability Analysis of CNN-based Deep Learning Inference in the Intel Distribution of OpenVINO Toolkit

Ejemplares similares