Cargando…

A Novel Feature-Engineered–NGBoost Machine-Learning Framework for Fraud Detection in Electric Power Consumption Data

This study presents a novel feature-engineered–natural gradient descent ensemble-boosting (NGBoost) machine-learning framework for detecting fraud in power consumption data. The proposed framework was sequentially executed in three stages: data pre-processing, feature engineering, and model evaluati...

Descripción completa

Detalles Bibliográficos
Autores principales: Hussain, Saddam, Mustafa, Mohd Wazir, Al-Shqeerat, Khalil Hamdi Ateyeh, Saeed, Faisal, Al-rimy, Bander Ali Saleh
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8704372/
https://www.ncbi.nlm.nih.gov/pubmed/34960516
http://dx.doi.org/10.3390/s21248423
_version_ 1784621691906818048
author Hussain, Saddam
Mustafa, Mohd Wazir
Al-Shqeerat, Khalil Hamdi Ateyeh
Saeed, Faisal
Al-rimy, Bander Ali Saleh
author_facet Hussain, Saddam
Mustafa, Mohd Wazir
Al-Shqeerat, Khalil Hamdi Ateyeh
Saeed, Faisal
Al-rimy, Bander Ali Saleh
author_sort Hussain, Saddam
collection PubMed
description This study presents a novel feature-engineered–natural gradient descent ensemble-boosting (NGBoost) machine-learning framework for detecting fraud in power consumption data. The proposed framework was sequentially executed in three stages: data pre-processing, feature engineering, and model evaluation. It utilized the random forest algorithm-based imputation technique initially to impute the missing data entries in the acquired smart meter dataset. In the second phase, the majority weighted minority oversampling technique (MWMOTE) algorithm was used to avoid an unequal distribution of data samples among different classes. The time-series feature-extraction library and whale optimization algorithm were utilized to extract and select the most relevant features from the kWh reading of consumers. Once the most relevant features were acquired, the model training and testing process was initiated by using the NGBoost algorithm to classify the consumers into two distinct categories (“Healthy” and “Theft”). Finally, each input feature’s impact (positive or negative) in predicting the target variable was recognized with the tree SHAP additive-explanations algorithm. The proposed framework achieved an accuracy of 93%, recall of 91%, and precision of 95%, which was greater than all the competing models, and thus validated its efficacy and significance in the studied field of research.
format Online
Article
Text
id pubmed-8704372
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-87043722021-12-25 A Novel Feature-Engineered–NGBoost Machine-Learning Framework for Fraud Detection in Electric Power Consumption Data Hussain, Saddam Mustafa, Mohd Wazir Al-Shqeerat, Khalil Hamdi Ateyeh Saeed, Faisal Al-rimy, Bander Ali Saleh Sensors (Basel) Article This study presents a novel feature-engineered–natural gradient descent ensemble-boosting (NGBoost) machine-learning framework for detecting fraud in power consumption data. The proposed framework was sequentially executed in three stages: data pre-processing, feature engineering, and model evaluation. It utilized the random forest algorithm-based imputation technique initially to impute the missing data entries in the acquired smart meter dataset. In the second phase, the majority weighted minority oversampling technique (MWMOTE) algorithm was used to avoid an unequal distribution of data samples among different classes. The time-series feature-extraction library and whale optimization algorithm were utilized to extract and select the most relevant features from the kWh reading of consumers. Once the most relevant features were acquired, the model training and testing process was initiated by using the NGBoost algorithm to classify the consumers into two distinct categories (“Healthy” and “Theft”). Finally, each input feature’s impact (positive or negative) in predicting the target variable was recognized with the tree SHAP additive-explanations algorithm. The proposed framework achieved an accuracy of 93%, recall of 91%, and precision of 95%, which was greater than all the competing models, and thus validated its efficacy and significance in the studied field of research. MDPI 2021-12-17 /pmc/articles/PMC8704372/ /pubmed/34960516 http://dx.doi.org/10.3390/s21248423 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Hussain, Saddam
Mustafa, Mohd Wazir
Al-Shqeerat, Khalil Hamdi Ateyeh
Saeed, Faisal
Al-rimy, Bander Ali Saleh
A Novel Feature-Engineered–NGBoost Machine-Learning Framework for Fraud Detection in Electric Power Consumption Data
title A Novel Feature-Engineered–NGBoost Machine-Learning Framework for Fraud Detection in Electric Power Consumption Data
title_full A Novel Feature-Engineered–NGBoost Machine-Learning Framework for Fraud Detection in Electric Power Consumption Data
title_fullStr A Novel Feature-Engineered–NGBoost Machine-Learning Framework for Fraud Detection in Electric Power Consumption Data
title_full_unstemmed A Novel Feature-Engineered–NGBoost Machine-Learning Framework for Fraud Detection in Electric Power Consumption Data
title_short A Novel Feature-Engineered–NGBoost Machine-Learning Framework for Fraud Detection in Electric Power Consumption Data
title_sort novel feature-engineered–ngboost machine-learning framework for fraud detection in electric power consumption data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8704372/
https://www.ncbi.nlm.nih.gov/pubmed/34960516
http://dx.doi.org/10.3390/s21248423
work_keys_str_mv AT hussainsaddam anovelfeatureengineeredngboostmachinelearningframeworkforfrauddetectioninelectricpowerconsumptiondata
AT mustafamohdwazir anovelfeatureengineeredngboostmachinelearningframeworkforfrauddetectioninelectricpowerconsumptiondata
AT alshqeeratkhalilhamdiateyeh anovelfeatureengineeredngboostmachinelearningframeworkforfrauddetectioninelectricpowerconsumptiondata
AT saeedfaisal anovelfeatureengineeredngboostmachinelearningframeworkforfrauddetectioninelectricpowerconsumptiondata
AT alrimybanderalisaleh anovelfeatureengineeredngboostmachinelearningframeworkforfrauddetectioninelectricpowerconsumptiondata
AT hussainsaddam novelfeatureengineeredngboostmachinelearningframeworkforfrauddetectioninelectricpowerconsumptiondata
AT mustafamohdwazir novelfeatureengineeredngboostmachinelearningframeworkforfrauddetectioninelectricpowerconsumptiondata
AT alshqeeratkhalilhamdiateyeh novelfeatureengineeredngboostmachinelearningframeworkforfrauddetectioninelectricpowerconsumptiondata
AT saeedfaisal novelfeatureengineeredngboostmachinelearningframeworkforfrauddetectioninelectricpowerconsumptiondata
AT alrimybanderalisaleh novelfeatureengineeredngboostmachinelearningframeworkforfrauddetectioninelectricpowerconsumptiondata