Cargando…

A Joint Model Provisioning and Request Dispatch Solution for Low-Latency Inference Services on Edge

With the advancement of machine learning, a growing number of mobile users rely on machine learning inference for making time-sensitive and safety-critical decisions. Therefore, the demand for high-quality and low-latency inference services at the network edge has become the key to modern intelligen...

Descripción completa

Detalles Bibliográficos
Autores principales:	Prasad, Anish, Mofjeld, Carl, Peng, Yang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8513104/ https://www.ncbi.nlm.nih.gov/pubmed/34640914 http://dx.doi.org/10.3390/s21196594

_version_	1784583152106209280
author	Prasad, Anish Mofjeld, Carl Peng, Yang
author_facet	Prasad, Anish Mofjeld, Carl Peng, Yang
author_sort	Prasad, Anish
collection	PubMed
description	With the advancement of machine learning, a growing number of mobile users rely on machine learning inference for making time-sensitive and safety-critical decisions. Therefore, the demand for high-quality and low-latency inference services at the network edge has become the key to modern intelligent society. This paper proposes a novel solution that jointly provisions machine learning models and dispatches inference requests to reduce inference latency on edge nodes. Existing solutions either direct inference requests to the nearest edge node to save network latency or balance edge nodes’ workload by reducing queuing and computing time. The proposed solution provisions each edge node with the optimal number and type of inference instances under a holistic consideration of networking, computing, and memory resources. Mobile users can thus be directed to utilize inference services on the edge nodes that offer minimal serving latency. The proposed solution has been implemented using TensorFlow Serving and Kubernetes on an edge cluster. Through simulation and testbed experiments under various system settings, the evaluation results showed that the joint strategy could consistently achieve lower latency than simply searching for the best edge node to serve inference requests.
format	Online Article Text
id	pubmed-8513104
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-85131042021-10-14 A Joint Model Provisioning and Request Dispatch Solution for Low-Latency Inference Services on Edge Prasad, Anish Mofjeld, Carl Peng, Yang Sensors (Basel) Article With the advancement of machine learning, a growing number of mobile users rely on machine learning inference for making time-sensitive and safety-critical decisions. Therefore, the demand for high-quality and low-latency inference services at the network edge has become the key to modern intelligent society. This paper proposes a novel solution that jointly provisions machine learning models and dispatches inference requests to reduce inference latency on edge nodes. Existing solutions either direct inference requests to the nearest edge node to save network latency or balance edge nodes’ workload by reducing queuing and computing time. The proposed solution provisions each edge node with the optimal number and type of inference instances under a holistic consideration of networking, computing, and memory resources. Mobile users can thus be directed to utilize inference services on the edge nodes that offer minimal serving latency. The proposed solution has been implemented using TensorFlow Serving and Kubernetes on an edge cluster. Through simulation and testbed experiments under various system settings, the evaluation results showed that the joint strategy could consistently achieve lower latency than simply searching for the best edge node to serve inference requests. MDPI 2021-10-02 /pmc/articles/PMC8513104/ /pubmed/34640914 http://dx.doi.org/10.3390/s21196594 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Prasad, Anish Mofjeld, Carl Peng, Yang A Joint Model Provisioning and Request Dispatch Solution for Low-Latency Inference Services on Edge
title	A Joint Model Provisioning and Request Dispatch Solution for Low-Latency Inference Services on Edge
title_full	A Joint Model Provisioning and Request Dispatch Solution for Low-Latency Inference Services on Edge
title_fullStr	A Joint Model Provisioning and Request Dispatch Solution for Low-Latency Inference Services on Edge
title_full_unstemmed	A Joint Model Provisioning and Request Dispatch Solution for Low-Latency Inference Services on Edge
title_short	A Joint Model Provisioning and Request Dispatch Solution for Low-Latency Inference Services on Edge
title_sort	joint model provisioning and request dispatch solution for low-latency inference services on edge
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8513104/ https://www.ncbi.nlm.nih.gov/pubmed/34640914 http://dx.doi.org/10.3390/s21196594
work_keys_str_mv	AT prasadanish ajointmodelprovisioningandrequestdispatchsolutionforlowlatencyinferenceservicesonedge AT mofjeldcarl ajointmodelprovisioningandrequestdispatchsolutionforlowlatencyinferenceservicesonedge AT pengyang ajointmodelprovisioningandrequestdispatchsolutionforlowlatencyinferenceservicesonedge AT prasadanish jointmodelprovisioningandrequestdispatchsolutionforlowlatencyinferenceservicesonedge AT mofjeldcarl jointmodelprovisioningandrequestdispatchsolutionforlowlatencyinferenceservicesonedge AT pengyang jointmodelprovisioningandrequestdispatchsolutionforlowlatencyinferenceservicesonedge

A Joint Model Provisioning and Request Dispatch Solution for Low-Latency Inference Services on Edge

Ejemplares similares