Cargando…

Effective hybrid feature selection using different bootstrap enhances cancers classification performance

BACKGROUND: Machine learning can be used to predict the different onset of human cancers. Highly dimensional data have enormous, complicated problems. One of these is an excessive number of genes plus over-fitting, fitting time, and classification accuracy. Recursive Feature Elimination (RFE) is a w...

Descripción completa

Detalles Bibliográficos
Autores principales: Abdelwahed, Noura Mohammed, El-Tawel, Gh. S., Makhlouf, M. A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9523996/
https://www.ncbi.nlm.nih.gov/pubmed/36175944
http://dx.doi.org/10.1186/s13040-022-00304-y
_version_ 1784800410801799168
author Abdelwahed, Noura Mohammed
El-Tawel, Gh. S.
Makhlouf, M. A.
author_facet Abdelwahed, Noura Mohammed
El-Tawel, Gh. S.
Makhlouf, M. A.
author_sort Abdelwahed, Noura Mohammed
collection PubMed
description BACKGROUND: Machine learning can be used to predict the different onset of human cancers. Highly dimensional data have enormous, complicated problems. One of these is an excessive number of genes plus over-fitting, fitting time, and classification accuracy. Recursive Feature Elimination (RFE) is a wrapper method for selecting the best subset of features that cause the best accuracy. Despite the high performance of RFE, time computation and over-fitting are two disadvantages of this algorithm. Random forest for selection (RFS) proves its effectiveness in selecting the effective features and improving the over-fitting problem. METHOD: This paper proposed a method, namely, positions first bootstrap step (PFBS) random forest selection recursive feature elimination (RFS-RFE) and its abbreviation is PFBS- RFS-RFE to enhance cancer classification performance. It used a bootstrap with many positions included in the outer first bootstrap step (OFBS), inner first bootstrap step (IFBS), and outer/ inner first bootstrap step (O/IFBS). In the first position, OFBS is applied as a resampling method (bootstrap) with replacement before selection step. The RFS is applied with bootstrap = false i.e., the whole datasets are used to build each tree. The importance features are hybrid with RFE to select the most relevant subset of features. In the second position, IFBS is applied as a resampling method (bootstrap) with replacement during applied RFS. The importance features are hybrid with RFE. In the third position, O/IFBS is applied as a hybrid of first and second positions. RFE used logistic regression (LR) as an estimator. The proposed methods are incorporated with four classifiers to solve the feature selection problems and modify the performance of RFE, in which five datasets with different size are used to assess the performance of the PFBS-RFS-RFE. RESULTS: The results showed that the O/IFBS-RFS-RFE achieved the best performance compared with previous work and enhanced the accuracy, variance and ROC area for RNA gene and dermatology erythemato-squamous diseases datasets to become 99.994%, 0.0000004, 1.000 and 100.000%, 0.0 and 1.000, respectively. CONCLUSION: High dimensional datasets and RFE algorithm face many troubles in cancers classification performance. PFBS-RFS-RFE is proposed to fix these troubles with different positions. The importance features which extracted from RFS are used with RFE to obtain the effective features.
format Online
Article
Text
id pubmed-9523996
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-95239962022-10-01 Effective hybrid feature selection using different bootstrap enhances cancers classification performance Abdelwahed, Noura Mohammed El-Tawel, Gh. S. Makhlouf, M. A. BioData Min Research BACKGROUND: Machine learning can be used to predict the different onset of human cancers. Highly dimensional data have enormous, complicated problems. One of these is an excessive number of genes plus over-fitting, fitting time, and classification accuracy. Recursive Feature Elimination (RFE) is a wrapper method for selecting the best subset of features that cause the best accuracy. Despite the high performance of RFE, time computation and over-fitting are two disadvantages of this algorithm. Random forest for selection (RFS) proves its effectiveness in selecting the effective features and improving the over-fitting problem. METHOD: This paper proposed a method, namely, positions first bootstrap step (PFBS) random forest selection recursive feature elimination (RFS-RFE) and its abbreviation is PFBS- RFS-RFE to enhance cancer classification performance. It used a bootstrap with many positions included in the outer first bootstrap step (OFBS), inner first bootstrap step (IFBS), and outer/ inner first bootstrap step (O/IFBS). In the first position, OFBS is applied as a resampling method (bootstrap) with replacement before selection step. The RFS is applied with bootstrap = false i.e., the whole datasets are used to build each tree. The importance features are hybrid with RFE to select the most relevant subset of features. In the second position, IFBS is applied as a resampling method (bootstrap) with replacement during applied RFS. The importance features are hybrid with RFE. In the third position, O/IFBS is applied as a hybrid of first and second positions. RFE used logistic regression (LR) as an estimator. The proposed methods are incorporated with four classifiers to solve the feature selection problems and modify the performance of RFE, in which five datasets with different size are used to assess the performance of the PFBS-RFS-RFE. RESULTS: The results showed that the O/IFBS-RFS-RFE achieved the best performance compared with previous work and enhanced the accuracy, variance and ROC area for RNA gene and dermatology erythemato-squamous diseases datasets to become 99.994%, 0.0000004, 1.000 and 100.000%, 0.0 and 1.000, respectively. CONCLUSION: High dimensional datasets and RFE algorithm face many troubles in cancers classification performance. PFBS-RFS-RFE is proposed to fix these troubles with different positions. The importance features which extracted from RFS are used with RFE to obtain the effective features. BioMed Central 2022-09-30 /pmc/articles/PMC9523996/ /pubmed/36175944 http://dx.doi.org/10.1186/s13040-022-00304-y Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Abdelwahed, Noura Mohammed
El-Tawel, Gh. S.
Makhlouf, M. A.
Effective hybrid feature selection using different bootstrap enhances cancers classification performance
title Effective hybrid feature selection using different bootstrap enhances cancers classification performance
title_full Effective hybrid feature selection using different bootstrap enhances cancers classification performance
title_fullStr Effective hybrid feature selection using different bootstrap enhances cancers classification performance
title_full_unstemmed Effective hybrid feature selection using different bootstrap enhances cancers classification performance
title_short Effective hybrid feature selection using different bootstrap enhances cancers classification performance
title_sort effective hybrid feature selection using different bootstrap enhances cancers classification performance
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9523996/
https://www.ncbi.nlm.nih.gov/pubmed/36175944
http://dx.doi.org/10.1186/s13040-022-00304-y
work_keys_str_mv AT abdelwahednouramohammed effectivehybridfeatureselectionusingdifferentbootstrapenhancescancersclassificationperformance
AT eltawelghs effectivehybridfeatureselectionusingdifferentbootstrapenhancescancersclassificationperformance
AT makhloufma effectivehybridfeatureselectionusingdifferentbootstrapenhancescancersclassificationperformance