Cargando…

Genetic algorithm-based feature selection with manifold learning for cancer classification using microarray data

BACKGROUND: Microarray data have been widely utilized for cancer classification. The main characteristic of microarray data is “large p and small n” in that data contain a small number of subjects but a large number of genes. It may affect the validity of the classification. Thus, there is a pressin...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Zixuan, Zhou, Yi, Takagi, Tatsuya, Song, Jiangning, Tian, Yu-Shi, Shibuya, Tetsuo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10082986/
https://www.ncbi.nlm.nih.gov/pubmed/37031189
http://dx.doi.org/10.1186/s12859-023-05267-3
_version_ 1785021415332773888
author Wang, Zixuan
Zhou, Yi
Takagi, Tatsuya
Song, Jiangning
Tian, Yu-Shi
Shibuya, Tetsuo
author_facet Wang, Zixuan
Zhou, Yi
Takagi, Tatsuya
Song, Jiangning
Tian, Yu-Shi
Shibuya, Tetsuo
author_sort Wang, Zixuan
collection PubMed
description BACKGROUND: Microarray data have been widely utilized for cancer classification. The main characteristic of microarray data is “large p and small n” in that data contain a small number of subjects but a large number of genes. It may affect the validity of the classification. Thus, there is a pressing demand of techniques able to select genes relevant to cancer classification. RESULTS: This study proposed a novel feature (gene) selection method, Iso-GA, for cancer classification. Iso-GA hybrids the manifold learning algorithm, Isomap, in the genetic algorithm (GA) to account for the latent nonlinear structure of the gene expression in the microarray data. The Davies–Bouldin index is adopted to evaluate the candidate solutions in Isomap and to avoid the classifier dependency problem. Additionally, a probability-based framework is introduced to reduce the possibility of genes being randomly selected by GA. The performance of Iso-GA was evaluated on eight benchmark microarray datasets of cancers. Iso-GA outperformed other benchmarking gene selection methods, leading to good classification accuracy with fewer critical genes selected. CONCLUSIONS: The proposed Iso-GA method can effectively select fewer but critical genes from microarray data to achieve competitive classification performance. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05267-3.
format Online
Article
Text
id pubmed-10082986
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-100829862023-04-10 Genetic algorithm-based feature selection with manifold learning for cancer classification using microarray data Wang, Zixuan Zhou, Yi Takagi, Tatsuya Song, Jiangning Tian, Yu-Shi Shibuya, Tetsuo BMC Bioinformatics Research BACKGROUND: Microarray data have been widely utilized for cancer classification. The main characteristic of microarray data is “large p and small n” in that data contain a small number of subjects but a large number of genes. It may affect the validity of the classification. Thus, there is a pressing demand of techniques able to select genes relevant to cancer classification. RESULTS: This study proposed a novel feature (gene) selection method, Iso-GA, for cancer classification. Iso-GA hybrids the manifold learning algorithm, Isomap, in the genetic algorithm (GA) to account for the latent nonlinear structure of the gene expression in the microarray data. The Davies–Bouldin index is adopted to evaluate the candidate solutions in Isomap and to avoid the classifier dependency problem. Additionally, a probability-based framework is introduced to reduce the possibility of genes being randomly selected by GA. The performance of Iso-GA was evaluated on eight benchmark microarray datasets of cancers. Iso-GA outperformed other benchmarking gene selection methods, leading to good classification accuracy with fewer critical genes selected. CONCLUSIONS: The proposed Iso-GA method can effectively select fewer but critical genes from microarray data to achieve competitive classification performance. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05267-3. BioMed Central 2023-04-08 /pmc/articles/PMC10082986/ /pubmed/37031189 http://dx.doi.org/10.1186/s12859-023-05267-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Wang, Zixuan
Zhou, Yi
Takagi, Tatsuya
Song, Jiangning
Tian, Yu-Shi
Shibuya, Tetsuo
Genetic algorithm-based feature selection with manifold learning for cancer classification using microarray data
title Genetic algorithm-based feature selection with manifold learning for cancer classification using microarray data
title_full Genetic algorithm-based feature selection with manifold learning for cancer classification using microarray data
title_fullStr Genetic algorithm-based feature selection with manifold learning for cancer classification using microarray data
title_full_unstemmed Genetic algorithm-based feature selection with manifold learning for cancer classification using microarray data
title_short Genetic algorithm-based feature selection with manifold learning for cancer classification using microarray data
title_sort genetic algorithm-based feature selection with manifold learning for cancer classification using microarray data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10082986/
https://www.ncbi.nlm.nih.gov/pubmed/37031189
http://dx.doi.org/10.1186/s12859-023-05267-3
work_keys_str_mv AT wangzixuan geneticalgorithmbasedfeatureselectionwithmanifoldlearningforcancerclassificationusingmicroarraydata
AT zhouyi geneticalgorithmbasedfeatureselectionwithmanifoldlearningforcancerclassificationusingmicroarraydata
AT takagitatsuya geneticalgorithmbasedfeatureselectionwithmanifoldlearningforcancerclassificationusingmicroarraydata
AT songjiangning geneticalgorithmbasedfeatureselectionwithmanifoldlearningforcancerclassificationusingmicroarraydata
AT tianyushi geneticalgorithmbasedfeatureselectionwithmanifoldlearningforcancerclassificationusingmicroarraydata
AT shibuyatetsuo geneticalgorithmbasedfeatureselectionwithmanifoldlearningforcancerclassificationusingmicroarraydata