Cargando…

Predicting Thalassemia Using Feature Selection Techniques: A Comparative Analysis

Thalassemia represents one of the most common genetic disorders worldwide, characterized by defects in hemoglobin synthesis. The affected individuals suffer from malfunctioning of one or more of the four globin genes, leading to chronic hemolytic anemia, an imbalance in the hemoglobin chain ratio, i...

Descripción completa

Detalles Bibliográficos
Autores principales: Saleem, Muniba, Aslam, Waqar, Lali, Muhammad Ikram Ullah, Rauf, Hafiz Tayyab, Nasr, Emad Abouel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10670018/
https://www.ncbi.nlm.nih.gov/pubmed/37998577
http://dx.doi.org/10.3390/diagnostics13223441
_version_ 1785149271041900544
author Saleem, Muniba
Aslam, Waqar
Lali, Muhammad Ikram Ullah
Rauf, Hafiz Tayyab
Nasr, Emad Abouel
author_facet Saleem, Muniba
Aslam, Waqar
Lali, Muhammad Ikram Ullah
Rauf, Hafiz Tayyab
Nasr, Emad Abouel
author_sort Saleem, Muniba
collection PubMed
description Thalassemia represents one of the most common genetic disorders worldwide, characterized by defects in hemoglobin synthesis. The affected individuals suffer from malfunctioning of one or more of the four globin genes, leading to chronic hemolytic anemia, an imbalance in the hemoglobin chain ratio, iron overload, and ineffective erythropoiesis. Despite the challenges posed by this condition, recent years have witnessed significant advancements in diagnosis, therapy, and transfusion support, significantly improving the prognosis for thalassemia patients. This research empirically evaluates the efficacy of models constructed using classification methods and explores the effectiveness of relevant features that are derived using various machine-learning techniques. Five feature selection approaches, namely Chi-Square (χ2), Exploratory Factor Score (EFS), tree-based Recursive Feature Elimination (RFE), gradient-based RFE, and Linear Regression Coefficient, were employed to determine the optimal feature set. Nine classifiers, namely K-Nearest Neighbors (KNN), Decision Trees (DT), Gradient Boosting Classifier (GBC), Linear Regression (LR), AdaBoost, Extreme Gradient Boosting (XGB), Random Forest (RF), Light Gradient Boosting Machine (LGBM), and Support Vector Machine (SVM), were utilized to evaluate the performance. The χ2 method achieved accuracy, registering 91.56% precision, 91.04% recall, and 92.65% f-score when aligned with the LR classifier. Moreover, the results underscore that amalgamating over-sampling with Synthetic Minority Over-sampling Technique (SMOTE), RFE, and 10-fold cross-validation markedly elevates the detection accuracy for αT patients. Notably, the Gradient Boosting Classifier (GBC) achieves 93.46% accuracy, 93.89% recall, and 92.72% F1 score.
format Online
Article
Text
id pubmed-10670018
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-106700182023-11-14 Predicting Thalassemia Using Feature Selection Techniques: A Comparative Analysis Saleem, Muniba Aslam, Waqar Lali, Muhammad Ikram Ullah Rauf, Hafiz Tayyab Nasr, Emad Abouel Diagnostics (Basel) Article Thalassemia represents one of the most common genetic disorders worldwide, characterized by defects in hemoglobin synthesis. The affected individuals suffer from malfunctioning of one or more of the four globin genes, leading to chronic hemolytic anemia, an imbalance in the hemoglobin chain ratio, iron overload, and ineffective erythropoiesis. Despite the challenges posed by this condition, recent years have witnessed significant advancements in diagnosis, therapy, and transfusion support, significantly improving the prognosis for thalassemia patients. This research empirically evaluates the efficacy of models constructed using classification methods and explores the effectiveness of relevant features that are derived using various machine-learning techniques. Five feature selection approaches, namely Chi-Square (χ2), Exploratory Factor Score (EFS), tree-based Recursive Feature Elimination (RFE), gradient-based RFE, and Linear Regression Coefficient, were employed to determine the optimal feature set. Nine classifiers, namely K-Nearest Neighbors (KNN), Decision Trees (DT), Gradient Boosting Classifier (GBC), Linear Regression (LR), AdaBoost, Extreme Gradient Boosting (XGB), Random Forest (RF), Light Gradient Boosting Machine (LGBM), and Support Vector Machine (SVM), were utilized to evaluate the performance. The χ2 method achieved accuracy, registering 91.56% precision, 91.04% recall, and 92.65% f-score when aligned with the LR classifier. Moreover, the results underscore that amalgamating over-sampling with Synthetic Minority Over-sampling Technique (SMOTE), RFE, and 10-fold cross-validation markedly elevates the detection accuracy for αT patients. Notably, the Gradient Boosting Classifier (GBC) achieves 93.46% accuracy, 93.89% recall, and 92.72% F1 score. MDPI 2023-11-14 /pmc/articles/PMC10670018/ /pubmed/37998577 http://dx.doi.org/10.3390/diagnostics13223441 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Saleem, Muniba
Aslam, Waqar
Lali, Muhammad Ikram Ullah
Rauf, Hafiz Tayyab
Nasr, Emad Abouel
Predicting Thalassemia Using Feature Selection Techniques: A Comparative Analysis
title Predicting Thalassemia Using Feature Selection Techniques: A Comparative Analysis
title_full Predicting Thalassemia Using Feature Selection Techniques: A Comparative Analysis
title_fullStr Predicting Thalassemia Using Feature Selection Techniques: A Comparative Analysis
title_full_unstemmed Predicting Thalassemia Using Feature Selection Techniques: A Comparative Analysis
title_short Predicting Thalassemia Using Feature Selection Techniques: A Comparative Analysis
title_sort predicting thalassemia using feature selection techniques: a comparative analysis
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10670018/
https://www.ncbi.nlm.nih.gov/pubmed/37998577
http://dx.doi.org/10.3390/diagnostics13223441
work_keys_str_mv AT saleemmuniba predictingthalassemiausingfeatureselectiontechniquesacomparativeanalysis
AT aslamwaqar predictingthalassemiausingfeatureselectiontechniquesacomparativeanalysis
AT lalimuhammadikramullah predictingthalassemiausingfeatureselectiontechniquesacomparativeanalysis
AT raufhafiztayyab predictingthalassemiausingfeatureselectiontechniquesacomparativeanalysis
AT nasremadabouel predictingthalassemiausingfeatureselectiontechniquesacomparativeanalysis