Cargando…

Comparing Sampling Strategies for Tackling Imbalanced Data in Human Activity Recognition

Human activity recognition (HAR) using wearable sensors is an increasingly active research topic in machine learning, aided in part by the ready availability of detailed motion capture data from smartphones, fitness trackers, and smartwatches. The goal of HAR is to use such devices to assist users i...

Descripción completa

Detalles Bibliográficos
Autores principales:	Alharbi, Fayez, Ouarbya, Lahcen, Ward, Jamie A
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8963022/ https://www.ncbi.nlm.nih.gov/pubmed/35214275 http://dx.doi.org/10.3390/s22041373

_version_	1784677902784135168
author	Alharbi, Fayez Ouarbya, Lahcen Ward, Jamie A
author_facet	Alharbi, Fayez Ouarbya, Lahcen Ward, Jamie A
author_sort	Alharbi, Fayez
collection	PubMed
description	Human activity recognition (HAR) using wearable sensors is an increasingly active research topic in machine learning, aided in part by the ready availability of detailed motion capture data from smartphones, fitness trackers, and smartwatches. The goal of HAR is to use such devices to assist users in their daily lives in application areas such as healthcare, physical therapy, and fitness. One of the main challenges for HAR, particularly when using supervised learning methods, is obtaining balanced data for algorithm optimisation and testing. As people perform some activities more than others (e.g., walk more than run), HAR datasets are typically imbalanced. The lack of dataset representation from minority classes hinders the ability of HAR classifiers to sufficiently capture new instances of those activities. We introduce three novel hybrid sampling strategies to generate more diverse synthetic samples to overcome the class imbalance problem. The first strategy, which we call the distance-based method (DBM), combines Synthetic Minority Oversampling Techniques (SMOTE) with Random_SMOTE, both of which are built around the k-nearest neighbors (KNN). The second technique, referred to as the noise detection-based method (NDBM), combines SMOTE Tomek links (SMOTE_Tomeklinks) and the modified synthetic minority oversampling technique (MSMOTE). The third approach, which we call the cluster-based method (CBM), combines Cluster-Based Synthetic Oversampling (CBSO) and Proximity Weighted Synthetic Oversampling Technique (ProWSyn). We compare the performance of the proposed hybrid methods to the individual constituent methods and baseline using accelerometer data from three commonly used benchmark datasets. We show that DBM, NDBM, and CBM reduce the impact of class imbalance and enhance F1 scores by a range of 9–20 percentage point compared to their constituent sampling methods. CBM performs significantly better than the others under a Friedman test, however, DBM has lower computational requirements.
format	Online Article Text
id	pubmed-8963022
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-89630222022-03-30 Comparing Sampling Strategies for Tackling Imbalanced Data in Human Activity Recognition Alharbi, Fayez Ouarbya, Lahcen Ward, Jamie A Sensors (Basel) Article Human activity recognition (HAR) using wearable sensors is an increasingly active research topic in machine learning, aided in part by the ready availability of detailed motion capture data from smartphones, fitness trackers, and smartwatches. The goal of HAR is to use such devices to assist users in their daily lives in application areas such as healthcare, physical therapy, and fitness. One of the main challenges for HAR, particularly when using supervised learning methods, is obtaining balanced data for algorithm optimisation and testing. As people perform some activities more than others (e.g., walk more than run), HAR datasets are typically imbalanced. The lack of dataset representation from minority classes hinders the ability of HAR classifiers to sufficiently capture new instances of those activities. We introduce three novel hybrid sampling strategies to generate more diverse synthetic samples to overcome the class imbalance problem. The first strategy, which we call the distance-based method (DBM), combines Synthetic Minority Oversampling Techniques (SMOTE) with Random_SMOTE, both of which are built around the k-nearest neighbors (KNN). The second technique, referred to as the noise detection-based method (NDBM), combines SMOTE Tomek links (SMOTE_Tomeklinks) and the modified synthetic minority oversampling technique (MSMOTE). The third approach, which we call the cluster-based method (CBM), combines Cluster-Based Synthetic Oversampling (CBSO) and Proximity Weighted Synthetic Oversampling Technique (ProWSyn). We compare the performance of the proposed hybrid methods to the individual constituent methods and baseline using accelerometer data from three commonly used benchmark datasets. We show that DBM, NDBM, and CBM reduce the impact of class imbalance and enhance F1 scores by a range of 9–20 percentage point compared to their constituent sampling methods. CBM performs significantly better than the others under a Friedman test, however, DBM has lower computational requirements. MDPI 2022-02-11 /pmc/articles/PMC8963022/ /pubmed/35214275 http://dx.doi.org/10.3390/s22041373 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Alharbi, Fayez Ouarbya, Lahcen Ward, Jamie A Comparing Sampling Strategies for Tackling Imbalanced Data in Human Activity Recognition
title	Comparing Sampling Strategies for Tackling Imbalanced Data in Human Activity Recognition
title_full	Comparing Sampling Strategies for Tackling Imbalanced Data in Human Activity Recognition
title_fullStr	Comparing Sampling Strategies for Tackling Imbalanced Data in Human Activity Recognition
title_full_unstemmed	Comparing Sampling Strategies for Tackling Imbalanced Data in Human Activity Recognition
title_short	Comparing Sampling Strategies for Tackling Imbalanced Data in Human Activity Recognition
title_sort	comparing sampling strategies for tackling imbalanced data in human activity recognition
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8963022/ https://www.ncbi.nlm.nih.gov/pubmed/35214275 http://dx.doi.org/10.3390/s22041373
work_keys_str_mv	AT alharbifayez comparingsamplingstrategiesfortacklingimbalanceddatainhumanactivityrecognition AT ouarbyalahcen comparingsamplingstrategiesfortacklingimbalanceddatainhumanactivityrecognition AT wardjamiea comparingsamplingstrategiesfortacklingimbalanceddatainhumanactivityrecognition

Comparing Sampling Strategies for Tackling Imbalanced Data in Human Activity Recognition

Ejemplares similares