Cargando…
A Novel Feature-Engineered–NGBoost Machine-Learning Framework for Fraud Detection in Electric Power Consumption Data
This study presents a novel feature-engineered–natural gradient descent ensemble-boosting (NGBoost) machine-learning framework for detecting fraud in power consumption data. The proposed framework was sequentially executed in three stages: data pre-processing, feature engineering, and model evaluati...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8704372/ https://www.ncbi.nlm.nih.gov/pubmed/34960516 http://dx.doi.org/10.3390/s21248423 |
_version_ | 1784621691906818048 |
---|---|
author | Hussain, Saddam Mustafa, Mohd Wazir Al-Shqeerat, Khalil Hamdi Ateyeh Saeed, Faisal Al-rimy, Bander Ali Saleh |
author_facet | Hussain, Saddam Mustafa, Mohd Wazir Al-Shqeerat, Khalil Hamdi Ateyeh Saeed, Faisal Al-rimy, Bander Ali Saleh |
author_sort | Hussain, Saddam |
collection | PubMed |
description | This study presents a novel feature-engineered–natural gradient descent ensemble-boosting (NGBoost) machine-learning framework for detecting fraud in power consumption data. The proposed framework was sequentially executed in three stages: data pre-processing, feature engineering, and model evaluation. It utilized the random forest algorithm-based imputation technique initially to impute the missing data entries in the acquired smart meter dataset. In the second phase, the majority weighted minority oversampling technique (MWMOTE) algorithm was used to avoid an unequal distribution of data samples among different classes. The time-series feature-extraction library and whale optimization algorithm were utilized to extract and select the most relevant features from the kWh reading of consumers. Once the most relevant features were acquired, the model training and testing process was initiated by using the NGBoost algorithm to classify the consumers into two distinct categories (“Healthy” and “Theft”). Finally, each input feature’s impact (positive or negative) in predicting the target variable was recognized with the tree SHAP additive-explanations algorithm. The proposed framework achieved an accuracy of 93%, recall of 91%, and precision of 95%, which was greater than all the competing models, and thus validated its efficacy and significance in the studied field of research. |
format | Online Article Text |
id | pubmed-8704372 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-87043722021-12-25 A Novel Feature-Engineered–NGBoost Machine-Learning Framework for Fraud Detection in Electric Power Consumption Data Hussain, Saddam Mustafa, Mohd Wazir Al-Shqeerat, Khalil Hamdi Ateyeh Saeed, Faisal Al-rimy, Bander Ali Saleh Sensors (Basel) Article This study presents a novel feature-engineered–natural gradient descent ensemble-boosting (NGBoost) machine-learning framework for detecting fraud in power consumption data. The proposed framework was sequentially executed in three stages: data pre-processing, feature engineering, and model evaluation. It utilized the random forest algorithm-based imputation technique initially to impute the missing data entries in the acquired smart meter dataset. In the second phase, the majority weighted minority oversampling technique (MWMOTE) algorithm was used to avoid an unequal distribution of data samples among different classes. The time-series feature-extraction library and whale optimization algorithm were utilized to extract and select the most relevant features from the kWh reading of consumers. Once the most relevant features were acquired, the model training and testing process was initiated by using the NGBoost algorithm to classify the consumers into two distinct categories (“Healthy” and “Theft”). Finally, each input feature’s impact (positive or negative) in predicting the target variable was recognized with the tree SHAP additive-explanations algorithm. The proposed framework achieved an accuracy of 93%, recall of 91%, and precision of 95%, which was greater than all the competing models, and thus validated its efficacy and significance in the studied field of research. MDPI 2021-12-17 /pmc/articles/PMC8704372/ /pubmed/34960516 http://dx.doi.org/10.3390/s21248423 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Hussain, Saddam Mustafa, Mohd Wazir Al-Shqeerat, Khalil Hamdi Ateyeh Saeed, Faisal Al-rimy, Bander Ali Saleh A Novel Feature-Engineered–NGBoost Machine-Learning Framework for Fraud Detection in Electric Power Consumption Data |
title | A Novel Feature-Engineered–NGBoost Machine-Learning Framework for Fraud Detection in Electric Power Consumption Data |
title_full | A Novel Feature-Engineered–NGBoost Machine-Learning Framework for Fraud Detection in Electric Power Consumption Data |
title_fullStr | A Novel Feature-Engineered–NGBoost Machine-Learning Framework for Fraud Detection in Electric Power Consumption Data |
title_full_unstemmed | A Novel Feature-Engineered–NGBoost Machine-Learning Framework for Fraud Detection in Electric Power Consumption Data |
title_short | A Novel Feature-Engineered–NGBoost Machine-Learning Framework for Fraud Detection in Electric Power Consumption Data |
title_sort | novel feature-engineered–ngboost machine-learning framework for fraud detection in electric power consumption data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8704372/ https://www.ncbi.nlm.nih.gov/pubmed/34960516 http://dx.doi.org/10.3390/s21248423 |
work_keys_str_mv | AT hussainsaddam anovelfeatureengineeredngboostmachinelearningframeworkforfrauddetectioninelectricpowerconsumptiondata AT mustafamohdwazir anovelfeatureengineeredngboostmachinelearningframeworkforfrauddetectioninelectricpowerconsumptiondata AT alshqeeratkhalilhamdiateyeh anovelfeatureengineeredngboostmachinelearningframeworkforfrauddetectioninelectricpowerconsumptiondata AT saeedfaisal anovelfeatureengineeredngboostmachinelearningframeworkforfrauddetectioninelectricpowerconsumptiondata AT alrimybanderalisaleh anovelfeatureengineeredngboostmachinelearningframeworkforfrauddetectioninelectricpowerconsumptiondata AT hussainsaddam novelfeatureengineeredngboostmachinelearningframeworkforfrauddetectioninelectricpowerconsumptiondata AT mustafamohdwazir novelfeatureengineeredngboostmachinelearningframeworkforfrauddetectioninelectricpowerconsumptiondata AT alshqeeratkhalilhamdiateyeh novelfeatureengineeredngboostmachinelearningframeworkforfrauddetectioninelectricpowerconsumptiondata AT saeedfaisal novelfeatureengineeredngboostmachinelearningframeworkforfrauddetectioninelectricpowerconsumptiondata AT alrimybanderalisaleh novelfeatureengineeredngboostmachinelearningframeworkforfrauddetectioninelectricpowerconsumptiondata |