Cargando…

Feature engineering solution with structured query language analytic functions in detecting electricity frauds using machine learning

Detecting fraud related to electricity consumption is usually a difficult challenge as the input datasets are sometimes unreliable due to missing and inconsistent records, faults, misinterpretation of meter reading remarks, status, etc. In this paper, we obtain meaningful insights from fraud detecti...

Descripción completa

Detalles Bibliográficos
Autores principales: Oprea, Simona-Vasilica, Bâra, Adela
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8885834/
https://www.ncbi.nlm.nih.gov/pubmed/35228648
http://dx.doi.org/10.1038/s41598-022-07337-7
_version_ 1784660533337653248
author Oprea, Simona-Vasilica
Bâra, Adela
author_facet Oprea, Simona-Vasilica
Bâra, Adela
author_sort Oprea, Simona-Vasilica
collection PubMed
description Detecting fraud related to electricity consumption is usually a difficult challenge as the input datasets are sometimes unreliable due to missing and inconsistent records, faults, misinterpretation of meter reading remarks, status, etc. In this paper, we obtain meaningful insights from fraud detection using real datasets of Tunisian electricity consumption metered by conventional meters. We propose an extensive feature engineering approach using the structured query language (SQL) analytic functions. Furthermore, double merging of datasets reveals more dimensions of the data allowing better detection of irregularities in consumption. We analyze the results of several machine learning (ML) algorithms that manage cases of weakly correlated features and highly unbalanced datasets. The skewness of the target is approached as a regular characteristic of the input data because most of consumers are fair and only a small portion attempt to mislead the utility companies by tampering with metering devices. Our fraud detection solutions consist of combining classifiers with an anomaly detection feature obtained with an unsupervised ML algorithm—Isolation Forest, and extensive feature engineering using SQL analytic functions on large datasets. Several techniques for feature processing enhanced the Area Under the Curve score for Decision Tree algorithm from 0.68 to 0.99.
format Online
Article
Text
id pubmed-8885834
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-88858342022-03-01 Feature engineering solution with structured query language analytic functions in detecting electricity frauds using machine learning Oprea, Simona-Vasilica Bâra, Adela Sci Rep Article Detecting fraud related to electricity consumption is usually a difficult challenge as the input datasets are sometimes unreliable due to missing and inconsistent records, faults, misinterpretation of meter reading remarks, status, etc. In this paper, we obtain meaningful insights from fraud detection using real datasets of Tunisian electricity consumption metered by conventional meters. We propose an extensive feature engineering approach using the structured query language (SQL) analytic functions. Furthermore, double merging of datasets reveals more dimensions of the data allowing better detection of irregularities in consumption. We analyze the results of several machine learning (ML) algorithms that manage cases of weakly correlated features and highly unbalanced datasets. The skewness of the target is approached as a regular characteristic of the input data because most of consumers are fair and only a small portion attempt to mislead the utility companies by tampering with metering devices. Our fraud detection solutions consist of combining classifiers with an anomaly detection feature obtained with an unsupervised ML algorithm—Isolation Forest, and extensive feature engineering using SQL analytic functions on large datasets. Several techniques for feature processing enhanced the Area Under the Curve score for Decision Tree algorithm from 0.68 to 0.99. Nature Publishing Group UK 2022-02-28 /pmc/articles/PMC8885834/ /pubmed/35228648 http://dx.doi.org/10.1038/s41598-022-07337-7 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Oprea, Simona-Vasilica
Bâra, Adela
Feature engineering solution with structured query language analytic functions in detecting electricity frauds using machine learning
title Feature engineering solution with structured query language analytic functions in detecting electricity frauds using machine learning
title_full Feature engineering solution with structured query language analytic functions in detecting electricity frauds using machine learning
title_fullStr Feature engineering solution with structured query language analytic functions in detecting electricity frauds using machine learning
title_full_unstemmed Feature engineering solution with structured query language analytic functions in detecting electricity frauds using machine learning
title_short Feature engineering solution with structured query language analytic functions in detecting electricity frauds using machine learning
title_sort feature engineering solution with structured query language analytic functions in detecting electricity frauds using machine learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8885834/
https://www.ncbi.nlm.nih.gov/pubmed/35228648
http://dx.doi.org/10.1038/s41598-022-07337-7
work_keys_str_mv AT opreasimonavasilica featureengineeringsolutionwithstructuredquerylanguageanalyticfunctionsindetectingelectricityfraudsusingmachinelearning
AT baraadela featureengineeringsolutionwithstructuredquerylanguageanalyticfunctionsindetectingelectricityfraudsusingmachinelearning