Cargando…
A Hybrid Machine Learning Approach to Screen Optimal Predictors for the Classification of Primary Breast Tumors from Gene Expression Microarray Data
The high dimensionality and sparsity of the microarray gene expression data make it challenging to analyze and screen the optimal subset of genes as predictors of breast cancer (BC). The authors in the present study propose a novel hybrid Feature Selection (FS) sequential framework involving minimum...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9955903/ https://www.ncbi.nlm.nih.gov/pubmed/36832196 http://dx.doi.org/10.3390/diagnostics13040708 |
_version_ | 1784894461520642048 |
---|---|
author | Alromema, Nashwan Syed, Asif Hassan Khan, Tabrej |
author_facet | Alromema, Nashwan Syed, Asif Hassan Khan, Tabrej |
author_sort | Alromema, Nashwan |
collection | PubMed |
description | The high dimensionality and sparsity of the microarray gene expression data make it challenging to analyze and screen the optimal subset of genes as predictors of breast cancer (BC). The authors in the present study propose a novel hybrid Feature Selection (FS) sequential framework involving minimum Redundancy-Maximum Relevance (mRMR), a two-tailed unpaired t-test, and meta-heuristics to screen the most optimal set of gene biomarkers as predictors for BC. The proposed framework identified a set of three most optimal gene biomarkers, namely, MAPK 1, APOBEC3B, and ENAH. In addition, the state-of-the-art supervised Machine Learning (ML) algorithms, namely Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Neural Net (NN), Naïve Bayes (NB), Decision Tree (DT), eXtreme Gradient Boosting (XGBoost), and Logistic Regression (LR) were used to test the predictive capability of the selected gene biomarkers and select the most effective breast cancer diagnostic model with higher values of performance matrices. Our study found that the XGBoost-based model was the superior performer with an accuracy of 0.976 ± 0.027, an F1-Score of 0.974 ± 0.030, and an AUC value of 0.961 ± 0.035 when tested on an independent test dataset. The screened gene biomarkers-based classification system efficiently detects primary breast tumors from normal breast samples. |
format | Online Article Text |
id | pubmed-9955903 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-99559032023-02-25 A Hybrid Machine Learning Approach to Screen Optimal Predictors for the Classification of Primary Breast Tumors from Gene Expression Microarray Data Alromema, Nashwan Syed, Asif Hassan Khan, Tabrej Diagnostics (Basel) Article The high dimensionality and sparsity of the microarray gene expression data make it challenging to analyze and screen the optimal subset of genes as predictors of breast cancer (BC). The authors in the present study propose a novel hybrid Feature Selection (FS) sequential framework involving minimum Redundancy-Maximum Relevance (mRMR), a two-tailed unpaired t-test, and meta-heuristics to screen the most optimal set of gene biomarkers as predictors for BC. The proposed framework identified a set of three most optimal gene biomarkers, namely, MAPK 1, APOBEC3B, and ENAH. In addition, the state-of-the-art supervised Machine Learning (ML) algorithms, namely Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Neural Net (NN), Naïve Bayes (NB), Decision Tree (DT), eXtreme Gradient Boosting (XGBoost), and Logistic Regression (LR) were used to test the predictive capability of the selected gene biomarkers and select the most effective breast cancer diagnostic model with higher values of performance matrices. Our study found that the XGBoost-based model was the superior performer with an accuracy of 0.976 ± 0.027, an F1-Score of 0.974 ± 0.030, and an AUC value of 0.961 ± 0.035 when tested on an independent test dataset. The screened gene biomarkers-based classification system efficiently detects primary breast tumors from normal breast samples. MDPI 2023-02-13 /pmc/articles/PMC9955903/ /pubmed/36832196 http://dx.doi.org/10.3390/diagnostics13040708 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Alromema, Nashwan Syed, Asif Hassan Khan, Tabrej A Hybrid Machine Learning Approach to Screen Optimal Predictors for the Classification of Primary Breast Tumors from Gene Expression Microarray Data |
title | A Hybrid Machine Learning Approach to Screen Optimal Predictors for the Classification of Primary Breast Tumors from Gene Expression Microarray Data |
title_full | A Hybrid Machine Learning Approach to Screen Optimal Predictors for the Classification of Primary Breast Tumors from Gene Expression Microarray Data |
title_fullStr | A Hybrid Machine Learning Approach to Screen Optimal Predictors for the Classification of Primary Breast Tumors from Gene Expression Microarray Data |
title_full_unstemmed | A Hybrid Machine Learning Approach to Screen Optimal Predictors for the Classification of Primary Breast Tumors from Gene Expression Microarray Data |
title_short | A Hybrid Machine Learning Approach to Screen Optimal Predictors for the Classification of Primary Breast Tumors from Gene Expression Microarray Data |
title_sort | hybrid machine learning approach to screen optimal predictors for the classification of primary breast tumors from gene expression microarray data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9955903/ https://www.ncbi.nlm.nih.gov/pubmed/36832196 http://dx.doi.org/10.3390/diagnostics13040708 |
work_keys_str_mv | AT alromemanashwan ahybridmachinelearningapproachtoscreenoptimalpredictorsfortheclassificationofprimarybreasttumorsfromgeneexpressionmicroarraydata AT syedasifhassan ahybridmachinelearningapproachtoscreenoptimalpredictorsfortheclassificationofprimarybreasttumorsfromgeneexpressionmicroarraydata AT khantabrej ahybridmachinelearningapproachtoscreenoptimalpredictorsfortheclassificationofprimarybreasttumorsfromgeneexpressionmicroarraydata AT alromemanashwan hybridmachinelearningapproachtoscreenoptimalpredictorsfortheclassificationofprimarybreasttumorsfromgeneexpressionmicroarraydata AT syedasifhassan hybridmachinelearningapproachtoscreenoptimalpredictorsfortheclassificationofprimarybreasttumorsfromgeneexpressionmicroarraydata AT khantabrej hybridmachinelearningapproachtoscreenoptimalpredictorsfortheclassificationofprimarybreasttumorsfromgeneexpressionmicroarraydata |