Cargando…

Cancer Classification Utilizing Voting Classifier with Ensemble Feature Selection Method and Transcriptomic Data

Biomarker-based cancer identification and classification tools are widely used in bioinformatics and machine learning fields. However, the high dimensionality of microarray gene expression data poses a challenge for identifying important genes in cancer diagnosis. Many feature selection algorithms o...

Descripción completa

Detalles Bibliográficos
Autores principales: Khatun, Rabea, Akter, Maksuda, Islam, Md. Manowarul, Uddin, Md. Ashraf, Talukder, Md. Alamin, Kamruzzaman, Joarder, Azad, AKM, Paul, Bikash Kumar, Almoyad, Muhammad Ali Abdulllah, Aryal, Sunil, Moni, Mohammad Ali
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10530870/
https://www.ncbi.nlm.nih.gov/pubmed/37761941
http://dx.doi.org/10.3390/genes14091802
_version_ 1785111586744041472
author Khatun, Rabea
Akter, Maksuda
Islam, Md. Manowarul
Uddin, Md. Ashraf
Talukder, Md. Alamin
Kamruzzaman, Joarder
Azad, AKM
Paul, Bikash Kumar
Almoyad, Muhammad Ali Abdulllah
Aryal, Sunil
Moni, Mohammad Ali
author_facet Khatun, Rabea
Akter, Maksuda
Islam, Md. Manowarul
Uddin, Md. Ashraf
Talukder, Md. Alamin
Kamruzzaman, Joarder
Azad, AKM
Paul, Bikash Kumar
Almoyad, Muhammad Ali Abdulllah
Aryal, Sunil
Moni, Mohammad Ali
author_sort Khatun, Rabea
collection PubMed
description Biomarker-based cancer identification and classification tools are widely used in bioinformatics and machine learning fields. However, the high dimensionality of microarray gene expression data poses a challenge for identifying important genes in cancer diagnosis. Many feature selection algorithms optimize cancer diagnosis by selecting optimal features. This article proposes an ensemble rank-based feature selection method (EFSM) and an ensemble weighted average voting classifier (VT) to overcome this challenge. The EFSM uses a ranking method that aggregates features from individual selection methods to efficiently discover the most relevant and useful features. The VT combines support vector machine, k-nearest neighbor, and decision tree algorithms to create an ensemble model. The proposed method was tested on three benchmark datasets and compared to existing built-in ensemble models. The results show that our model achieved higher accuracy, with 100% for leukaemia, 94.74% for colon cancer, and 94.34% for the 11-tumor dataset. This study concludes by identifying a subset of the most important cancer-causing genes and demonstrating their significance compared to the original data. The proposed approach surpasses existing strategies in accuracy and stability, significantly impacting the development of ML-based gene analysis. It detects vital genes with higher precision and stability than other existing methods.
format Online
Article
Text
id pubmed-10530870
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-105308702023-09-28 Cancer Classification Utilizing Voting Classifier with Ensemble Feature Selection Method and Transcriptomic Data Khatun, Rabea Akter, Maksuda Islam, Md. Manowarul Uddin, Md. Ashraf Talukder, Md. Alamin Kamruzzaman, Joarder Azad, AKM Paul, Bikash Kumar Almoyad, Muhammad Ali Abdulllah Aryal, Sunil Moni, Mohammad Ali Genes (Basel) Article Biomarker-based cancer identification and classification tools are widely used in bioinformatics and machine learning fields. However, the high dimensionality of microarray gene expression data poses a challenge for identifying important genes in cancer diagnosis. Many feature selection algorithms optimize cancer diagnosis by selecting optimal features. This article proposes an ensemble rank-based feature selection method (EFSM) and an ensemble weighted average voting classifier (VT) to overcome this challenge. The EFSM uses a ranking method that aggregates features from individual selection methods to efficiently discover the most relevant and useful features. The VT combines support vector machine, k-nearest neighbor, and decision tree algorithms to create an ensemble model. The proposed method was tested on three benchmark datasets and compared to existing built-in ensemble models. The results show that our model achieved higher accuracy, with 100% for leukaemia, 94.74% for colon cancer, and 94.34% for the 11-tumor dataset. This study concludes by identifying a subset of the most important cancer-causing genes and demonstrating their significance compared to the original data. The proposed approach surpasses existing strategies in accuracy and stability, significantly impacting the development of ML-based gene analysis. It detects vital genes with higher precision and stability than other existing methods. MDPI 2023-09-14 /pmc/articles/PMC10530870/ /pubmed/37761941 http://dx.doi.org/10.3390/genes14091802 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Khatun, Rabea
Akter, Maksuda
Islam, Md. Manowarul
Uddin, Md. Ashraf
Talukder, Md. Alamin
Kamruzzaman, Joarder
Azad, AKM
Paul, Bikash Kumar
Almoyad, Muhammad Ali Abdulllah
Aryal, Sunil
Moni, Mohammad Ali
Cancer Classification Utilizing Voting Classifier with Ensemble Feature Selection Method and Transcriptomic Data
title Cancer Classification Utilizing Voting Classifier with Ensemble Feature Selection Method and Transcriptomic Data
title_full Cancer Classification Utilizing Voting Classifier with Ensemble Feature Selection Method and Transcriptomic Data
title_fullStr Cancer Classification Utilizing Voting Classifier with Ensemble Feature Selection Method and Transcriptomic Data
title_full_unstemmed Cancer Classification Utilizing Voting Classifier with Ensemble Feature Selection Method and Transcriptomic Data
title_short Cancer Classification Utilizing Voting Classifier with Ensemble Feature Selection Method and Transcriptomic Data
title_sort cancer classification utilizing voting classifier with ensemble feature selection method and transcriptomic data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10530870/
https://www.ncbi.nlm.nih.gov/pubmed/37761941
http://dx.doi.org/10.3390/genes14091802
work_keys_str_mv AT khatunrabea cancerclassificationutilizingvotingclassifierwithensemblefeatureselectionmethodandtranscriptomicdata
AT aktermaksuda cancerclassificationutilizingvotingclassifierwithensemblefeatureselectionmethodandtranscriptomicdata
AT islammdmanowarul cancerclassificationutilizingvotingclassifierwithensemblefeatureselectionmethodandtranscriptomicdata
AT uddinmdashraf cancerclassificationutilizingvotingclassifierwithensemblefeatureselectionmethodandtranscriptomicdata
AT talukdermdalamin cancerclassificationutilizingvotingclassifierwithensemblefeatureselectionmethodandtranscriptomicdata
AT kamruzzamanjoarder cancerclassificationutilizingvotingclassifierwithensemblefeatureselectionmethodandtranscriptomicdata
AT azadakm cancerclassificationutilizingvotingclassifierwithensemblefeatureselectionmethodandtranscriptomicdata
AT paulbikashkumar cancerclassificationutilizingvotingclassifierwithensemblefeatureselectionmethodandtranscriptomicdata
AT almoyadmuhammadaliabdulllah cancerclassificationutilizingvotingclassifierwithensemblefeatureselectionmethodandtranscriptomicdata
AT aryalsunil cancerclassificationutilizingvotingclassifierwithensemblefeatureselectionmethodandtranscriptomicdata
AT monimohammadali cancerclassificationutilizingvotingclassifierwithensemblefeatureselectionmethodandtranscriptomicdata