Cargando…

A Hybrid Machine Learning Approach to Screen Optimal Predictors for the Classification of Primary Breast Tumors from Gene Expression Microarray Data

The high dimensionality and sparsity of the microarray gene expression data make it challenging to analyze and screen the optimal subset of genes as predictors of breast cancer (BC). The authors in the present study propose a novel hybrid Feature Selection (FS) sequential framework involving minimum...

Descripción completa

Detalles Bibliográficos
Autores principales: Alromema, Nashwan, Syed, Asif Hassan, Khan, Tabrej
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9955903/
https://www.ncbi.nlm.nih.gov/pubmed/36832196
http://dx.doi.org/10.3390/diagnostics13040708
_version_ 1784894461520642048
author Alromema, Nashwan
Syed, Asif Hassan
Khan, Tabrej
author_facet Alromema, Nashwan
Syed, Asif Hassan
Khan, Tabrej
author_sort Alromema, Nashwan
collection PubMed
description The high dimensionality and sparsity of the microarray gene expression data make it challenging to analyze and screen the optimal subset of genes as predictors of breast cancer (BC). The authors in the present study propose a novel hybrid Feature Selection (FS) sequential framework involving minimum Redundancy-Maximum Relevance (mRMR), a two-tailed unpaired t-test, and meta-heuristics to screen the most optimal set of gene biomarkers as predictors for BC. The proposed framework identified a set of three most optimal gene biomarkers, namely, MAPK 1, APOBEC3B, and ENAH. In addition, the state-of-the-art supervised Machine Learning (ML) algorithms, namely Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Neural Net (NN), Naïve Bayes (NB), Decision Tree (DT), eXtreme Gradient Boosting (XGBoost), and Logistic Regression (LR) were used to test the predictive capability of the selected gene biomarkers and select the most effective breast cancer diagnostic model with higher values of performance matrices. Our study found that the XGBoost-based model was the superior performer with an accuracy of 0.976 ± 0.027, an F1-Score of 0.974 ± 0.030, and an AUC value of 0.961 ± 0.035 when tested on an independent test dataset. The screened gene biomarkers-based classification system efficiently detects primary breast tumors from normal breast samples.
format Online
Article
Text
id pubmed-9955903
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-99559032023-02-25 A Hybrid Machine Learning Approach to Screen Optimal Predictors for the Classification of Primary Breast Tumors from Gene Expression Microarray Data Alromema, Nashwan Syed, Asif Hassan Khan, Tabrej Diagnostics (Basel) Article The high dimensionality and sparsity of the microarray gene expression data make it challenging to analyze and screen the optimal subset of genes as predictors of breast cancer (BC). The authors in the present study propose a novel hybrid Feature Selection (FS) sequential framework involving minimum Redundancy-Maximum Relevance (mRMR), a two-tailed unpaired t-test, and meta-heuristics to screen the most optimal set of gene biomarkers as predictors for BC. The proposed framework identified a set of three most optimal gene biomarkers, namely, MAPK 1, APOBEC3B, and ENAH. In addition, the state-of-the-art supervised Machine Learning (ML) algorithms, namely Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Neural Net (NN), Naïve Bayes (NB), Decision Tree (DT), eXtreme Gradient Boosting (XGBoost), and Logistic Regression (LR) were used to test the predictive capability of the selected gene biomarkers and select the most effective breast cancer diagnostic model with higher values of performance matrices. Our study found that the XGBoost-based model was the superior performer with an accuracy of 0.976 ± 0.027, an F1-Score of 0.974 ± 0.030, and an AUC value of 0.961 ± 0.035 when tested on an independent test dataset. The screened gene biomarkers-based classification system efficiently detects primary breast tumors from normal breast samples. MDPI 2023-02-13 /pmc/articles/PMC9955903/ /pubmed/36832196 http://dx.doi.org/10.3390/diagnostics13040708 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Alromema, Nashwan
Syed, Asif Hassan
Khan, Tabrej
A Hybrid Machine Learning Approach to Screen Optimal Predictors for the Classification of Primary Breast Tumors from Gene Expression Microarray Data
title A Hybrid Machine Learning Approach to Screen Optimal Predictors for the Classification of Primary Breast Tumors from Gene Expression Microarray Data
title_full A Hybrid Machine Learning Approach to Screen Optimal Predictors for the Classification of Primary Breast Tumors from Gene Expression Microarray Data
title_fullStr A Hybrid Machine Learning Approach to Screen Optimal Predictors for the Classification of Primary Breast Tumors from Gene Expression Microarray Data
title_full_unstemmed A Hybrid Machine Learning Approach to Screen Optimal Predictors for the Classification of Primary Breast Tumors from Gene Expression Microarray Data
title_short A Hybrid Machine Learning Approach to Screen Optimal Predictors for the Classification of Primary Breast Tumors from Gene Expression Microarray Data
title_sort hybrid machine learning approach to screen optimal predictors for the classification of primary breast tumors from gene expression microarray data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9955903/
https://www.ncbi.nlm.nih.gov/pubmed/36832196
http://dx.doi.org/10.3390/diagnostics13040708
work_keys_str_mv AT alromemanashwan ahybridmachinelearningapproachtoscreenoptimalpredictorsfortheclassificationofprimarybreasttumorsfromgeneexpressionmicroarraydata
AT syedasifhassan ahybridmachinelearningapproachtoscreenoptimalpredictorsfortheclassificationofprimarybreasttumorsfromgeneexpressionmicroarraydata
AT khantabrej ahybridmachinelearningapproachtoscreenoptimalpredictorsfortheclassificationofprimarybreasttumorsfromgeneexpressionmicroarraydata
AT alromemanashwan hybridmachinelearningapproachtoscreenoptimalpredictorsfortheclassificationofprimarybreasttumorsfromgeneexpressionmicroarraydata
AT syedasifhassan hybridmachinelearningapproachtoscreenoptimalpredictorsfortheclassificationofprimarybreasttumorsfromgeneexpressionmicroarraydata
AT khantabrej hybridmachinelearningapproachtoscreenoptimalpredictorsfortheclassificationofprimarybreasttumorsfromgeneexpressionmicroarraydata