Cargando…

Genetic algorithm-based feature selection with manifold learning for cancer classification using microarray data

BACKGROUND: Microarray data have been widely utilized for cancer classification. The main characteristic of microarray data is “large p and small n” in that data contain a small number of subjects but a large number of genes. It may affect the validity of the classification. Thus, there is a pressin...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Zixuan, Zhou, Yi, Takagi, Tatsuya, Song, Jiangning, Tian, Yu-Shi, Shibuya, Tetsuo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10082986/
https://www.ncbi.nlm.nih.gov/pubmed/37031189
http://dx.doi.org/10.1186/s12859-023-05267-3
Descripción
Sumario:BACKGROUND: Microarray data have been widely utilized for cancer classification. The main characteristic of microarray data is “large p and small n” in that data contain a small number of subjects but a large number of genes. It may affect the validity of the classification. Thus, there is a pressing demand of techniques able to select genes relevant to cancer classification. RESULTS: This study proposed a novel feature (gene) selection method, Iso-GA, for cancer classification. Iso-GA hybrids the manifold learning algorithm, Isomap, in the genetic algorithm (GA) to account for the latent nonlinear structure of the gene expression in the microarray data. The Davies–Bouldin index is adopted to evaluate the candidate solutions in Isomap and to avoid the classifier dependency problem. Additionally, a probability-based framework is introduced to reduce the possibility of genes being randomly selected by GA. The performance of Iso-GA was evaluated on eight benchmark microarray datasets of cancers. Iso-GA outperformed other benchmarking gene selection methods, leading to good classification accuracy with fewer critical genes selected. CONCLUSIONS: The proposed Iso-GA method can effectively select fewer but critical genes from microarray data to achieve competitive classification performance. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05267-3.