Cargando…
Effective hybrid feature selection using different bootstrap enhances cancers classification performance
BACKGROUND: Machine learning can be used to predict the different onset of human cancers. Highly dimensional data have enormous, complicated problems. One of these is an excessive number of genes plus over-fitting, fitting time, and classification accuracy. Recursive Feature Elimination (RFE) is a w...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9523996/ https://www.ncbi.nlm.nih.gov/pubmed/36175944 http://dx.doi.org/10.1186/s13040-022-00304-y |
_version_ | 1784800410801799168 |
---|---|
author | Abdelwahed, Noura Mohammed El-Tawel, Gh. S. Makhlouf, M. A. |
author_facet | Abdelwahed, Noura Mohammed El-Tawel, Gh. S. Makhlouf, M. A. |
author_sort | Abdelwahed, Noura Mohammed |
collection | PubMed |
description | BACKGROUND: Machine learning can be used to predict the different onset of human cancers. Highly dimensional data have enormous, complicated problems. One of these is an excessive number of genes plus over-fitting, fitting time, and classification accuracy. Recursive Feature Elimination (RFE) is a wrapper method for selecting the best subset of features that cause the best accuracy. Despite the high performance of RFE, time computation and over-fitting are two disadvantages of this algorithm. Random forest for selection (RFS) proves its effectiveness in selecting the effective features and improving the over-fitting problem. METHOD: This paper proposed a method, namely, positions first bootstrap step (PFBS) random forest selection recursive feature elimination (RFS-RFE) and its abbreviation is PFBS- RFS-RFE to enhance cancer classification performance. It used a bootstrap with many positions included in the outer first bootstrap step (OFBS), inner first bootstrap step (IFBS), and outer/ inner first bootstrap step (O/IFBS). In the first position, OFBS is applied as a resampling method (bootstrap) with replacement before selection step. The RFS is applied with bootstrap = false i.e., the whole datasets are used to build each tree. The importance features are hybrid with RFE to select the most relevant subset of features. In the second position, IFBS is applied as a resampling method (bootstrap) with replacement during applied RFS. The importance features are hybrid with RFE. In the third position, O/IFBS is applied as a hybrid of first and second positions. RFE used logistic regression (LR) as an estimator. The proposed methods are incorporated with four classifiers to solve the feature selection problems and modify the performance of RFE, in which five datasets with different size are used to assess the performance of the PFBS-RFS-RFE. RESULTS: The results showed that the O/IFBS-RFS-RFE achieved the best performance compared with previous work and enhanced the accuracy, variance and ROC area for RNA gene and dermatology erythemato-squamous diseases datasets to become 99.994%, 0.0000004, 1.000 and 100.000%, 0.0 and 1.000, respectively. CONCLUSION: High dimensional datasets and RFE algorithm face many troubles in cancers classification performance. PFBS-RFS-RFE is proposed to fix these troubles with different positions. The importance features which extracted from RFS are used with RFE to obtain the effective features. |
format | Online Article Text |
id | pubmed-9523996 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-95239962022-10-01 Effective hybrid feature selection using different bootstrap enhances cancers classification performance Abdelwahed, Noura Mohammed El-Tawel, Gh. S. Makhlouf, M. A. BioData Min Research BACKGROUND: Machine learning can be used to predict the different onset of human cancers. Highly dimensional data have enormous, complicated problems. One of these is an excessive number of genes plus over-fitting, fitting time, and classification accuracy. Recursive Feature Elimination (RFE) is a wrapper method for selecting the best subset of features that cause the best accuracy. Despite the high performance of RFE, time computation and over-fitting are two disadvantages of this algorithm. Random forest for selection (RFS) proves its effectiveness in selecting the effective features and improving the over-fitting problem. METHOD: This paper proposed a method, namely, positions first bootstrap step (PFBS) random forest selection recursive feature elimination (RFS-RFE) and its abbreviation is PFBS- RFS-RFE to enhance cancer classification performance. It used a bootstrap with many positions included in the outer first bootstrap step (OFBS), inner first bootstrap step (IFBS), and outer/ inner first bootstrap step (O/IFBS). In the first position, OFBS is applied as a resampling method (bootstrap) with replacement before selection step. The RFS is applied with bootstrap = false i.e., the whole datasets are used to build each tree. The importance features are hybrid with RFE to select the most relevant subset of features. In the second position, IFBS is applied as a resampling method (bootstrap) with replacement during applied RFS. The importance features are hybrid with RFE. In the third position, O/IFBS is applied as a hybrid of first and second positions. RFE used logistic regression (LR) as an estimator. The proposed methods are incorporated with four classifiers to solve the feature selection problems and modify the performance of RFE, in which five datasets with different size are used to assess the performance of the PFBS-RFS-RFE. RESULTS: The results showed that the O/IFBS-RFS-RFE achieved the best performance compared with previous work and enhanced the accuracy, variance and ROC area for RNA gene and dermatology erythemato-squamous diseases datasets to become 99.994%, 0.0000004, 1.000 and 100.000%, 0.0 and 1.000, respectively. CONCLUSION: High dimensional datasets and RFE algorithm face many troubles in cancers classification performance. PFBS-RFS-RFE is proposed to fix these troubles with different positions. The importance features which extracted from RFS are used with RFE to obtain the effective features. BioMed Central 2022-09-30 /pmc/articles/PMC9523996/ /pubmed/36175944 http://dx.doi.org/10.1186/s13040-022-00304-y Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Abdelwahed, Noura Mohammed El-Tawel, Gh. S. Makhlouf, M. A. Effective hybrid feature selection using different bootstrap enhances cancers classification performance |
title | Effective hybrid feature selection using different bootstrap enhances cancers classification performance |
title_full | Effective hybrid feature selection using different bootstrap enhances cancers classification performance |
title_fullStr | Effective hybrid feature selection using different bootstrap enhances cancers classification performance |
title_full_unstemmed | Effective hybrid feature selection using different bootstrap enhances cancers classification performance |
title_short | Effective hybrid feature selection using different bootstrap enhances cancers classification performance |
title_sort | effective hybrid feature selection using different bootstrap enhances cancers classification performance |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9523996/ https://www.ncbi.nlm.nih.gov/pubmed/36175944 http://dx.doi.org/10.1186/s13040-022-00304-y |
work_keys_str_mv | AT abdelwahednouramohammed effectivehybridfeatureselectionusingdifferentbootstrapenhancescancersclassificationperformance AT eltawelghs effectivehybridfeatureselectionusingdifferentbootstrapenhancescancersclassificationperformance AT makhloufma effectivehybridfeatureselectionusingdifferentbootstrapenhancescancersclassificationperformance |