Cargando…

Improved Sparse Multi-Class SVM and Its Application for Gene Selection in Cancer Classification

BACKGROUND: Microarray techniques provide promising tools for cancer diagnosis using gene expression profiles. However, molecular diagnosis based on high-throughput platforms presents great challenges due to the overwhelming number of variables versus the small sample size and the complex nature of...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Lingkang, Zhang, Hao Helen, Zeng, Zhao-Bang, Bushel, Pierre R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Libertas Academica 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3740816/
https://www.ncbi.nlm.nih.gov/pubmed/23966761
http://dx.doi.org/10.4137/CIN.S10212
_version_ 1782280179889995776
author Huang, Lingkang
Zhang, Hao Helen
Zeng, Zhao-Bang
Bushel, Pierre R.
author_facet Huang, Lingkang
Zhang, Hao Helen
Zeng, Zhao-Bang
Bushel, Pierre R.
author_sort Huang, Lingkang
collection PubMed
description BACKGROUND: Microarray techniques provide promising tools for cancer diagnosis using gene expression profiles. However, molecular diagnosis based on high-throughput platforms presents great challenges due to the overwhelming number of variables versus the small sample size and the complex nature of multi-type tumors. Support vector machines (SVMs) have shown superior performance in cancer classification due to their ability to handle high dimensional low sample size data. The multi-class SVM algorithm of Crammer and Singer provides a natural framework for multi-class learning. Despite its effective performance, the procedure utilizes all variables without selection. In this paper, we propose to improve the procedure by imposing shrinkage penalties in learning to enforce solution sparsity. RESULTS: The original multi-class SVM of Crammer and Singer is effective for multi-class classification but does not conduct variable selection. We improved the method by introducing soft-thresholding type penalties to incorporate variable selection into multi-class classification for high dimensional data. The new methods were applied to simulated data and two cancer gene expression data sets. The results demonstrate that the new methods can select a small number of genes for building accurate multi-class classification rules. Furthermore, the important genes selected by the methods overlap significantly, suggesting general agreement among different variable selection schemes. CONCLUSIONS: High accuracy and sparsity make the new methods attractive for cancer diagnostics with gene expression data and defining targets of therapeutic intervention. Availability: The source MATLAB code are available from http://math.arizona.edu/~hzhang/software.html.
format Online
Article
Text
id pubmed-3740816
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-37408162013-08-21 Improved Sparse Multi-Class SVM and Its Application for Gene Selection in Cancer Classification Huang, Lingkang Zhang, Hao Helen Zeng, Zhao-Bang Bushel, Pierre R. Cancer Inform Original Research BACKGROUND: Microarray techniques provide promising tools for cancer diagnosis using gene expression profiles. However, molecular diagnosis based on high-throughput platforms presents great challenges due to the overwhelming number of variables versus the small sample size and the complex nature of multi-type tumors. Support vector machines (SVMs) have shown superior performance in cancer classification due to their ability to handle high dimensional low sample size data. The multi-class SVM algorithm of Crammer and Singer provides a natural framework for multi-class learning. Despite its effective performance, the procedure utilizes all variables without selection. In this paper, we propose to improve the procedure by imposing shrinkage penalties in learning to enforce solution sparsity. RESULTS: The original multi-class SVM of Crammer and Singer is effective for multi-class classification but does not conduct variable selection. We improved the method by introducing soft-thresholding type penalties to incorporate variable selection into multi-class classification for high dimensional data. The new methods were applied to simulated data and two cancer gene expression data sets. The results demonstrate that the new methods can select a small number of genes for building accurate multi-class classification rules. Furthermore, the important genes selected by the methods overlap significantly, suggesting general agreement among different variable selection schemes. CONCLUSIONS: High accuracy and sparsity make the new methods attractive for cancer diagnostics with gene expression data and defining targets of therapeutic intervention. Availability: The source MATLAB code are available from http://math.arizona.edu/~hzhang/software.html. Libertas Academica 2013-08-04 /pmc/articles/PMC3740816/ /pubmed/23966761 http://dx.doi.org/10.4137/CIN.S10212 Text en © 2013 the author(s), publisher and licensee Libertas Academica Ltd. This is an open access article published under the Creative Commons CC-BY-NC 3.0 license.
spellingShingle Original Research
Huang, Lingkang
Zhang, Hao Helen
Zeng, Zhao-Bang
Bushel, Pierre R.
Improved Sparse Multi-Class SVM and Its Application for Gene Selection in Cancer Classification
title Improved Sparse Multi-Class SVM and Its Application for Gene Selection in Cancer Classification
title_full Improved Sparse Multi-Class SVM and Its Application for Gene Selection in Cancer Classification
title_fullStr Improved Sparse Multi-Class SVM and Its Application for Gene Selection in Cancer Classification
title_full_unstemmed Improved Sparse Multi-Class SVM and Its Application for Gene Selection in Cancer Classification
title_short Improved Sparse Multi-Class SVM and Its Application for Gene Selection in Cancer Classification
title_sort improved sparse multi-class svm and its application for gene selection in cancer classification
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3740816/
https://www.ncbi.nlm.nih.gov/pubmed/23966761
http://dx.doi.org/10.4137/CIN.S10212
work_keys_str_mv AT huanglingkang improvedsparsemulticlasssvmanditsapplicationforgeneselectionincancerclassification
AT zhanghaohelen improvedsparsemulticlasssvmanditsapplicationforgeneselectionincancerclassification
AT zengzhaobang improvedsparsemulticlasssvmanditsapplicationforgeneselectionincancerclassification
AT bushelpierrer improvedsparsemulticlasssvmanditsapplicationforgeneselectionincancerclassification