Cargando…

A Machine Learning Framework for Balancing Training Sets of Sensor Sequential Data Streams

The recent explosive growth in the number of smart technologies relying on data collected from sensors and processed with machine learning classifiers made the training data imbalance problem more visible than ever before. Class-imbalanced sets used to train models of various events of interest are...

Descripción completa

Detalles Bibliográficos
Autores principales: Setiawan, Budi Darma, Serdült, Uwe, Kryssanov, Victor
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8540530/
https://www.ncbi.nlm.nih.gov/pubmed/34696105
http://dx.doi.org/10.3390/s21206892
Descripción
Sumario:The recent explosive growth in the number of smart technologies relying on data collected from sensors and processed with machine learning classifiers made the training data imbalance problem more visible than ever before. Class-imbalanced sets used to train models of various events of interest are among the main reasons for a smart technology to work incorrectly or even to completely fail. This paper presents an attempt to resolve the imbalance problem in sensor sequential (time-series) data through training data augmentation. An Unrolled Generative Adversarial Networks (Unrolled GAN)-powered framework is developed and successfully used to balance the training data of smartphone accelerometer and gyroscope sensors in different contexts of road surface monitoring. Experiments with other sensor data from an open data collection are also conducted. It is demonstrated that the proposed approach allows for improving the classification performance in the case of heavily imbalanced data (the F1 score increased from 0.69 to 0.72, [Formula: see text] , in the presented case study). However, the effect is negligible in the case of slightly imbalanced or inadequate training sets. The latter determines the limitations of this study that would be resolved in future work aimed at incorporating mechanisms for assessing the training data quality into the proposed framework and improving its computational efficiency.