Cargando…
Feature engineering solution with structured query language analytic functions in detecting electricity frauds using machine learning
Detecting fraud related to electricity consumption is usually a difficult challenge as the input datasets are sometimes unreliable due to missing and inconsistent records, faults, misinterpretation of meter reading remarks, status, etc. In this paper, we obtain meaningful insights from fraud detecti...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8885834/ https://www.ncbi.nlm.nih.gov/pubmed/35228648 http://dx.doi.org/10.1038/s41598-022-07337-7 |
_version_ | 1784660533337653248 |
---|---|
author | Oprea, Simona-Vasilica Bâra, Adela |
author_facet | Oprea, Simona-Vasilica Bâra, Adela |
author_sort | Oprea, Simona-Vasilica |
collection | PubMed |
description | Detecting fraud related to electricity consumption is usually a difficult challenge as the input datasets are sometimes unreliable due to missing and inconsistent records, faults, misinterpretation of meter reading remarks, status, etc. In this paper, we obtain meaningful insights from fraud detection using real datasets of Tunisian electricity consumption metered by conventional meters. We propose an extensive feature engineering approach using the structured query language (SQL) analytic functions. Furthermore, double merging of datasets reveals more dimensions of the data allowing better detection of irregularities in consumption. We analyze the results of several machine learning (ML) algorithms that manage cases of weakly correlated features and highly unbalanced datasets. The skewness of the target is approached as a regular characteristic of the input data because most of consumers are fair and only a small portion attempt to mislead the utility companies by tampering with metering devices. Our fraud detection solutions consist of combining classifiers with an anomaly detection feature obtained with an unsupervised ML algorithm—Isolation Forest, and extensive feature engineering using SQL analytic functions on large datasets. Several techniques for feature processing enhanced the Area Under the Curve score for Decision Tree algorithm from 0.68 to 0.99. |
format | Online Article Text |
id | pubmed-8885834 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-88858342022-03-01 Feature engineering solution with structured query language analytic functions in detecting electricity frauds using machine learning Oprea, Simona-Vasilica Bâra, Adela Sci Rep Article Detecting fraud related to electricity consumption is usually a difficult challenge as the input datasets are sometimes unreliable due to missing and inconsistent records, faults, misinterpretation of meter reading remarks, status, etc. In this paper, we obtain meaningful insights from fraud detection using real datasets of Tunisian electricity consumption metered by conventional meters. We propose an extensive feature engineering approach using the structured query language (SQL) analytic functions. Furthermore, double merging of datasets reveals more dimensions of the data allowing better detection of irregularities in consumption. We analyze the results of several machine learning (ML) algorithms that manage cases of weakly correlated features and highly unbalanced datasets. The skewness of the target is approached as a regular characteristic of the input data because most of consumers are fair and only a small portion attempt to mislead the utility companies by tampering with metering devices. Our fraud detection solutions consist of combining classifiers with an anomaly detection feature obtained with an unsupervised ML algorithm—Isolation Forest, and extensive feature engineering using SQL analytic functions on large datasets. Several techniques for feature processing enhanced the Area Under the Curve score for Decision Tree algorithm from 0.68 to 0.99. Nature Publishing Group UK 2022-02-28 /pmc/articles/PMC8885834/ /pubmed/35228648 http://dx.doi.org/10.1038/s41598-022-07337-7 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Oprea, Simona-Vasilica Bâra, Adela Feature engineering solution with structured query language analytic functions in detecting electricity frauds using machine learning |
title | Feature engineering solution with structured query language analytic functions in detecting electricity frauds using machine learning |
title_full | Feature engineering solution with structured query language analytic functions in detecting electricity frauds using machine learning |
title_fullStr | Feature engineering solution with structured query language analytic functions in detecting electricity frauds using machine learning |
title_full_unstemmed | Feature engineering solution with structured query language analytic functions in detecting electricity frauds using machine learning |
title_short | Feature engineering solution with structured query language analytic functions in detecting electricity frauds using machine learning |
title_sort | feature engineering solution with structured query language analytic functions in detecting electricity frauds using machine learning |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8885834/ https://www.ncbi.nlm.nih.gov/pubmed/35228648 http://dx.doi.org/10.1038/s41598-022-07337-7 |
work_keys_str_mv | AT opreasimonavasilica featureengineeringsolutionwithstructuredquerylanguageanalyticfunctionsindetectingelectricityfraudsusingmachinelearning AT baraadela featureengineeringsolutionwithstructuredquerylanguageanalyticfunctionsindetectingelectricityfraudsusingmachinelearning |