Cargando…

Efficient Selection of Gaussian Kernel SVM Parameters for Imbalanced Data

For medical data mining, the development of a class prediction model has been widely used to deal with various kinds of data classification problems. Classification models especially for high-dimensional gene expression datasets have attracted many researchers in order to identify marker genes for d...

Descripción completa

Detalles Bibliográficos
Autores principales: Tsai, Chen-An, Chang, Yu-Jing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10048125/
https://www.ncbi.nlm.nih.gov/pubmed/36980852
http://dx.doi.org/10.3390/genes14030583
Descripción
Sumario:For medical data mining, the development of a class prediction model has been widely used to deal with various kinds of data classification problems. Classification models especially for high-dimensional gene expression datasets have attracted many researchers in order to identify marker genes for distinguishing any type of cancer cells from their corresponding normal cells. However, skewed class distributions often occur in the medical datasets in which at least one of the classes has a relatively small number of observations. A classifier induced by such an imbalanced dataset typically has a high accuracy for the majority class and poor prediction for the minority class. In this study, we focus on an SVM classifier with a Gaussian radial basis kernel for a binary classification problem. In order to take advantage of an SVM and to achieve the best generalization ability for improving the classification performance, we will address two important problems: the class imbalance and parameter selection during SVM parameter optimization. First of all, we proposed a novel adjustment method called b-SVM, for adjusting the cutoff threshold of the SVM. Second, we proposed a fast and simple approach, called the Min-max gamma selection, to optimize the model parameters of SVMs without carrying out an extensive k-fold cross validation. An extensive comparison with a standard SVM and well-known existing methods are carried out to evaluate the performance of our proposed algorithms using simulated and real datasets. The experimental results show that our proposed algorithms outperform the over-sampling techniques and existing SVM-based solutions. This study also shows that the proposed Min-max gamma selection is at least 10 times faster than the cross-validation selection based on the average running time on six real datasets.