Cargando…

Addressing Binary Classification over Class Imbalanced Clinical Datasets Using Computationally Intelligent Techniques

Nowadays, healthcare is the prime need of every human being in the world, and clinical datasets play an important role in developing an intelligent healthcare system for monitoring the health of people. Mostly, the real-world datasets are inherently class imbalanced, clinical datasets also suffer fr...

Descripción completa

Detalles Bibliográficos
Autores principales: Kumar, Vinod, Lalotra, Gotam Singh, Sasikala, Ponnusamy, Rajput, Dharmendra Singh, Kaluri, Rajesh, Lakshmanna, Kuruva, Shorfuzzaman, Mohammad, Alsufyani, Abdulmajeed, Uddin, Mueen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9322725/
https://www.ncbi.nlm.nih.gov/pubmed/35885819
http://dx.doi.org/10.3390/healthcare10071293
_version_ 1784756375749918720
author Kumar, Vinod
Lalotra, Gotam Singh
Sasikala, Ponnusamy
Rajput, Dharmendra Singh
Kaluri, Rajesh
Lakshmanna, Kuruva
Shorfuzzaman, Mohammad
Alsufyani, Abdulmajeed
Uddin, Mueen
author_facet Kumar, Vinod
Lalotra, Gotam Singh
Sasikala, Ponnusamy
Rajput, Dharmendra Singh
Kaluri, Rajesh
Lakshmanna, Kuruva
Shorfuzzaman, Mohammad
Alsufyani, Abdulmajeed
Uddin, Mueen
author_sort Kumar, Vinod
collection PubMed
description Nowadays, healthcare is the prime need of every human being in the world, and clinical datasets play an important role in developing an intelligent healthcare system for monitoring the health of people. Mostly, the real-world datasets are inherently class imbalanced, clinical datasets also suffer from this imbalance problem, and the imbalanced class distributions pose several issues in the training of classifiers. Consequently, classifiers suffer from low accuracy, precision, recall, and a high degree of misclassification, etc. We performed a brief literature review on the class imbalanced learning scenario. This study carries the empirical performance evaluation of six classifiers, namely Decision Tree, k-Nearest Neighbor, Logistic regression, Artificial Neural Network, Support Vector Machine, and Gaussian Naïve Bayes, over five imbalanced clinical datasets, Breast Cancer Disease, Coronary Heart Disease, Indian Liver Patient, Pima Indians Diabetes Database, and Coronary Kidney Disease, with respect to seven different class balancing techniques, namely Undersampling, Random oversampling, SMOTE, ADASYN, SVM-SMOTE, SMOTEEN, and SMOTETOMEK. In addition to this, the appropriate explanations for the superiority of the classifiers as well as data-balancing techniques are also explored. Furthermore, we discuss the possible recommendations on how to tackle the class imbalanced datasets while training the different supervised machine learning methods. Result analysis demonstrates that SMOTEEN balancing method often performed better over all the other six data-balancing techniques with all six classifiers and for all five clinical datasets. Except for SMOTEEN, all other six balancing techniques almost had equal performance but moderately lesser performance than SMOTEEN.
format Online
Article
Text
id pubmed-9322725
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-93227252022-07-27 Addressing Binary Classification over Class Imbalanced Clinical Datasets Using Computationally Intelligent Techniques Kumar, Vinod Lalotra, Gotam Singh Sasikala, Ponnusamy Rajput, Dharmendra Singh Kaluri, Rajesh Lakshmanna, Kuruva Shorfuzzaman, Mohammad Alsufyani, Abdulmajeed Uddin, Mueen Healthcare (Basel) Article Nowadays, healthcare is the prime need of every human being in the world, and clinical datasets play an important role in developing an intelligent healthcare system for monitoring the health of people. Mostly, the real-world datasets are inherently class imbalanced, clinical datasets also suffer from this imbalance problem, and the imbalanced class distributions pose several issues in the training of classifiers. Consequently, classifiers suffer from low accuracy, precision, recall, and a high degree of misclassification, etc. We performed a brief literature review on the class imbalanced learning scenario. This study carries the empirical performance evaluation of six classifiers, namely Decision Tree, k-Nearest Neighbor, Logistic regression, Artificial Neural Network, Support Vector Machine, and Gaussian Naïve Bayes, over five imbalanced clinical datasets, Breast Cancer Disease, Coronary Heart Disease, Indian Liver Patient, Pima Indians Diabetes Database, and Coronary Kidney Disease, with respect to seven different class balancing techniques, namely Undersampling, Random oversampling, SMOTE, ADASYN, SVM-SMOTE, SMOTEEN, and SMOTETOMEK. In addition to this, the appropriate explanations for the superiority of the classifiers as well as data-balancing techniques are also explored. Furthermore, we discuss the possible recommendations on how to tackle the class imbalanced datasets while training the different supervised machine learning methods. Result analysis demonstrates that SMOTEEN balancing method often performed better over all the other six data-balancing techniques with all six classifiers and for all five clinical datasets. Except for SMOTEEN, all other six balancing techniques almost had equal performance but moderately lesser performance than SMOTEEN. MDPI 2022-07-13 /pmc/articles/PMC9322725/ /pubmed/35885819 http://dx.doi.org/10.3390/healthcare10071293 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Kumar, Vinod
Lalotra, Gotam Singh
Sasikala, Ponnusamy
Rajput, Dharmendra Singh
Kaluri, Rajesh
Lakshmanna, Kuruva
Shorfuzzaman, Mohammad
Alsufyani, Abdulmajeed
Uddin, Mueen
Addressing Binary Classification over Class Imbalanced Clinical Datasets Using Computationally Intelligent Techniques
title Addressing Binary Classification over Class Imbalanced Clinical Datasets Using Computationally Intelligent Techniques
title_full Addressing Binary Classification over Class Imbalanced Clinical Datasets Using Computationally Intelligent Techniques
title_fullStr Addressing Binary Classification over Class Imbalanced Clinical Datasets Using Computationally Intelligent Techniques
title_full_unstemmed Addressing Binary Classification over Class Imbalanced Clinical Datasets Using Computationally Intelligent Techniques
title_short Addressing Binary Classification over Class Imbalanced Clinical Datasets Using Computationally Intelligent Techniques
title_sort addressing binary classification over class imbalanced clinical datasets using computationally intelligent techniques
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9322725/
https://www.ncbi.nlm.nih.gov/pubmed/35885819
http://dx.doi.org/10.3390/healthcare10071293
work_keys_str_mv AT kumarvinod addressingbinaryclassificationoverclassimbalancedclinicaldatasetsusingcomputationallyintelligenttechniques
AT lalotragotamsingh addressingbinaryclassificationoverclassimbalancedclinicaldatasetsusingcomputationallyintelligenttechniques
AT sasikalaponnusamy addressingbinaryclassificationoverclassimbalancedclinicaldatasetsusingcomputationallyintelligenttechniques
AT rajputdharmendrasingh addressingbinaryclassificationoverclassimbalancedclinicaldatasetsusingcomputationallyintelligenttechniques
AT kalurirajesh addressingbinaryclassificationoverclassimbalancedclinicaldatasetsusingcomputationallyintelligenttechniques
AT lakshmannakuruva addressingbinaryclassificationoverclassimbalancedclinicaldatasetsusingcomputationallyintelligenttechniques
AT shorfuzzamanmohammad addressingbinaryclassificationoverclassimbalancedclinicaldatasetsusingcomputationallyintelligenttechniques
AT alsufyaniabdulmajeed addressingbinaryclassificationoverclassimbalancedclinicaldatasetsusingcomputationallyintelligenttechniques
AT uddinmueen addressingbinaryclassificationoverclassimbalancedclinicaldatasetsusingcomputationallyintelligenttechniques