Cargando…

Analysis of Machine Learning Algorithms for Anomaly Detection on Edge Devices

The Internet of Things (IoT) consists of small devices or a network of sensors, which permanently generate huge amounts of data. Usually, they have limited resources, either computing power or memory, which means that raw data are transferred to central systems or the cloud for analysis. Lately, the...

Descripción completa

Detalles Bibliográficos
Autores principales: Huč, Aleks, Šalej, Jakob, Trebar, Mira
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8309800/
https://www.ncbi.nlm.nih.gov/pubmed/34300686
http://dx.doi.org/10.3390/s21144946
_version_ 1783728608478494720
author Huč, Aleks
Šalej, Jakob
Trebar, Mira
author_facet Huč, Aleks
Šalej, Jakob
Trebar, Mira
author_sort Huč, Aleks
collection PubMed
description The Internet of Things (IoT) consists of small devices or a network of sensors, which permanently generate huge amounts of data. Usually, they have limited resources, either computing power or memory, which means that raw data are transferred to central systems or the cloud for analysis. Lately, the idea of moving intelligence to the IoT is becoming feasible, with machine learning (ML) moved to edge devices. The aim of this study is to provide an experimental analysis of processing a large imbalanced dataset (DS2OS), split into a training dataset (80%) and a test dataset (20%). The training dataset was reduced by randomly selecting a smaller number of samples to create new datasets Di (i = 1, 2, 5, 10, 15, 20, 40, 60, 80%). Afterwards, they were used with several machine learning algorithms to identify the size at which the performance metrics show saturation and classification results stop improving with an F1 score equal to 0.95 or higher, which happened at 20% of the training dataset. Further on, two solutions for the reduction of the number of samples to provide a balanced dataset are given. In the first, datasets DRi consist of all anomalous samples in seven classes and a reduced majority class (‘NL’) with i = 0.1, 0.2, 0.5, 1, 2, 5, 10, 15, 20 percent of randomly selected samples. In the second, datasets DCi are generated from the representative samples determined with clustering from the training dataset. All three dataset reduction methods showed comparable performance results. Further evaluation of training times and memory usage on Raspberry Pi 4 shows a possibility to run ML algorithms with limited sized datasets on edge devices.
format Online
Article
Text
id pubmed-8309800
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-83098002021-07-25 Analysis of Machine Learning Algorithms for Anomaly Detection on Edge Devices Huč, Aleks Šalej, Jakob Trebar, Mira Sensors (Basel) Article The Internet of Things (IoT) consists of small devices or a network of sensors, which permanently generate huge amounts of data. Usually, they have limited resources, either computing power or memory, which means that raw data are transferred to central systems or the cloud for analysis. Lately, the idea of moving intelligence to the IoT is becoming feasible, with machine learning (ML) moved to edge devices. The aim of this study is to provide an experimental analysis of processing a large imbalanced dataset (DS2OS), split into a training dataset (80%) and a test dataset (20%). The training dataset was reduced by randomly selecting a smaller number of samples to create new datasets Di (i = 1, 2, 5, 10, 15, 20, 40, 60, 80%). Afterwards, they were used with several machine learning algorithms to identify the size at which the performance metrics show saturation and classification results stop improving with an F1 score equal to 0.95 or higher, which happened at 20% of the training dataset. Further on, two solutions for the reduction of the number of samples to provide a balanced dataset are given. In the first, datasets DRi consist of all anomalous samples in seven classes and a reduced majority class (‘NL’) with i = 0.1, 0.2, 0.5, 1, 2, 5, 10, 15, 20 percent of randomly selected samples. In the second, datasets DCi are generated from the representative samples determined with clustering from the training dataset. All three dataset reduction methods showed comparable performance results. Further evaluation of training times and memory usage on Raspberry Pi 4 shows a possibility to run ML algorithms with limited sized datasets on edge devices. MDPI 2021-07-20 /pmc/articles/PMC8309800/ /pubmed/34300686 http://dx.doi.org/10.3390/s21144946 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Huč, Aleks
Šalej, Jakob
Trebar, Mira
Analysis of Machine Learning Algorithms for Anomaly Detection on Edge Devices
title Analysis of Machine Learning Algorithms for Anomaly Detection on Edge Devices
title_full Analysis of Machine Learning Algorithms for Anomaly Detection on Edge Devices
title_fullStr Analysis of Machine Learning Algorithms for Anomaly Detection on Edge Devices
title_full_unstemmed Analysis of Machine Learning Algorithms for Anomaly Detection on Edge Devices
title_short Analysis of Machine Learning Algorithms for Anomaly Detection on Edge Devices
title_sort analysis of machine learning algorithms for anomaly detection on edge devices
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8309800/
https://www.ncbi.nlm.nih.gov/pubmed/34300686
http://dx.doi.org/10.3390/s21144946
work_keys_str_mv AT hucaleks analysisofmachinelearningalgorithmsforanomalydetectiononedgedevices
AT salejjakob analysisofmachinelearningalgorithmsforanomalydetectiononedgedevices
AT trebarmira analysisofmachinelearningalgorithmsforanomalydetectiononedgedevices