Cargando…

Utilizing data sampling techniques on algorithmic fairness for customer churn prediction with data imbalance problems

Background: Customer churn prediction (CCP) refers to detecting which customers are likely to cancel the services provided by a service provider, for example, internet services. The class imbalance problem (CIP) in machine learning occurs when there is a huge difference in the samples of the positiv...

Descripción completa

Detalles Bibliográficos
Autores principales:	Maw, Maw, Haw, Su-Cheng, Ho, Chin-Kuan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	F1000 Research Limited 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9428497/ https://www.ncbi.nlm.nih.gov/pubmed/36071889 http://dx.doi.org/10.12688/f1000research.72929.2

_version_	1784779132798763008
author	Maw, Maw Haw, Su-Cheng Ho, Chin-Kuan
author_facet	Maw, Maw Haw, Su-Cheng Ho, Chin-Kuan
author_sort	Maw, Maw
collection	PubMed
description	Background: Customer churn prediction (CCP) refers to detecting which customers are likely to cancel the services provided by a service provider, for example, internet services. The class imbalance problem (CIP) in machine learning occurs when there is a huge difference in the samples of the positive class compared to the negative class. It is one of the major obstacles in CCP as it deteriorates performance in the classification process. Utilizing data sampling techniques (DSTs) helps to resolve the CIP to some extent. Methods: In this paper, we review the effect of using DSTs on algorithmic fairness, i.e., to investigate whether the results pose any discrimination between male and female groups and compare the results before and after using DSTs. Three real-world datasets with unequal balancing rates were prepared and four ubiquitous DSTs were applied to them. Six popular classification techniques were utilized in the classification process. Both classifier’s performance and algorithmic fairness are evaluated with notable metrics. Results: The results indicated that the Random Forest classifier outperforms other classifiers in all three datasets and, that using SMOTE and ADASYN techniques causes more discrimination in the female group. The rate of unintentional discrimination seems to be higher in the original data of extremely unbalanced datasets under the following classifiers: Logistics Regression, LightGBM, and XGBoost. Conclusions: Algorithmic fairness has become a broadly studied area in recent years, yet there is very little systematic study on the effect of using DSTs on algorithmic fairness. This study presents important findings to further the use of algorithmic fairness in CCP research.
format	Online Article Text
id	pubmed-9428497
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	F1000 Research Limited
record_format	MEDLINE/PubMed
spelling	pubmed-94284972022-09-06 Utilizing data sampling techniques on algorithmic fairness for customer churn prediction with data imbalance problems Maw, Maw Haw, Su-Cheng Ho, Chin-Kuan F1000Res Research Article Background: Customer churn prediction (CCP) refers to detecting which customers are likely to cancel the services provided by a service provider, for example, internet services. The class imbalance problem (CIP) in machine learning occurs when there is a huge difference in the samples of the positive class compared to the negative class. It is one of the major obstacles in CCP as it deteriorates performance in the classification process. Utilizing data sampling techniques (DSTs) helps to resolve the CIP to some extent. Methods: In this paper, we review the effect of using DSTs on algorithmic fairness, i.e., to investigate whether the results pose any discrimination between male and female groups and compare the results before and after using DSTs. Three real-world datasets with unequal balancing rates were prepared and four ubiquitous DSTs were applied to them. Six popular classification techniques were utilized in the classification process. Both classifier’s performance and algorithmic fairness are evaluated with notable metrics. Results: The results indicated that the Random Forest classifier outperforms other classifiers in all three datasets and, that using SMOTE and ADASYN techniques causes more discrimination in the female group. The rate of unintentional discrimination seems to be higher in the original data of extremely unbalanced datasets under the following classifiers: Logistics Regression, LightGBM, and XGBoost. Conclusions: Algorithmic fairness has become a broadly studied area in recent years, yet there is very little systematic study on the effect of using DSTs on algorithmic fairness. This study presents important findings to further the use of algorithmic fairness in CCP research. F1000 Research Limited 2022-06-27 /pmc/articles/PMC9428497/ /pubmed/36071889 http://dx.doi.org/10.12688/f1000research.72929.2 Text en Copyright: © 2022 Maw M et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Maw, Maw Haw, Su-Cheng Ho, Chin-Kuan Utilizing data sampling techniques on algorithmic fairness for customer churn prediction with data imbalance problems
title	Utilizing data sampling techniques on algorithmic fairness for customer churn prediction with data imbalance problems
title_full	Utilizing data sampling techniques on algorithmic fairness for customer churn prediction with data imbalance problems
title_fullStr	Utilizing data sampling techniques on algorithmic fairness for customer churn prediction with data imbalance problems
title_full_unstemmed	Utilizing data sampling techniques on algorithmic fairness for customer churn prediction with data imbalance problems
title_short	Utilizing data sampling techniques on algorithmic fairness for customer churn prediction with data imbalance problems
title_sort	utilizing data sampling techniques on algorithmic fairness for customer churn prediction with data imbalance problems
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9428497/ https://www.ncbi.nlm.nih.gov/pubmed/36071889 http://dx.doi.org/10.12688/f1000research.72929.2
work_keys_str_mv	AT mawmaw utilizingdatasamplingtechniquesonalgorithmicfairnessforcustomerchurnpredictionwithdataimbalanceproblems AT hawsucheng utilizingdatasamplingtechniquesonalgorithmicfairnessforcustomerchurnpredictionwithdataimbalanceproblems AT hochinkuan utilizingdatasamplingtechniquesonalgorithmicfairnessforcustomerchurnpredictionwithdataimbalanceproblems

Utilizing data sampling techniques on algorithmic fairness for customer churn prediction with data imbalance problems

Ejemplares similares