Cargando…

Overcome Support Vector Machine Diagnosis Overfitting

Support vector machines (SVMs) are widely employed in molecular diagnosis of disease for their efficiency and robustness. However, there is no previous research to analyze their overfitting in high-dimensional omics data based disease diagnosis, which is essential to avoid deceptive diagnostic resul...

Descripción completa

Detalles Bibliográficos
Autores principales: Han, Henry, Jiang, Xiaoqian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Libertas Academica 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4264614/
https://www.ncbi.nlm.nih.gov/pubmed/25574125
http://dx.doi.org/10.4137/CIN.S13875
_version_ 1782348768962674688
author Han, Henry
Jiang, Xiaoqian
author_facet Han, Henry
Jiang, Xiaoqian
author_sort Han, Henry
collection PubMed
description Support vector machines (SVMs) are widely employed in molecular diagnosis of disease for their efficiency and robustness. However, there is no previous research to analyze their overfitting in high-dimensional omics data based disease diagnosis, which is essential to avoid deceptive diagnostic results and enhance clinical decision making. In this work, we comprehensively investigate this problem from both theoretical and practical standpoints to unveil the special characteristics of SVM overfitting. We found that disease diagnosis under an SVM classifier would inevitably encounter overfitting under a Gaussian kernel because of the large data variations generated from high-throughput profiling technologies. Furthermore, we propose a novel sparse-coding kernel approach to overcome SVM overfitting in disease diagnosis. Unlike traditional ad-hoc parametric tuning approaches, it not only robustly conquers the overfitting problem, but also achieves good diagnostic accuracy. To our knowledge, it is the first rigorous method proposed to overcome SVM overfitting. Finally, we propose a novel biomarker discovery algorithm: Gene-Switch-Marker (GSM) to capture meaningful biomarkers by taking advantage of SVM overfitting on single genes.
format Online
Article
Text
id pubmed-4264614
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-42646142015-01-08 Overcome Support Vector Machine Diagnosis Overfitting Han, Henry Jiang, Xiaoqian Cancer Inform Review Support vector machines (SVMs) are widely employed in molecular diagnosis of disease for their efficiency and robustness. However, there is no previous research to analyze their overfitting in high-dimensional omics data based disease diagnosis, which is essential to avoid deceptive diagnostic results and enhance clinical decision making. In this work, we comprehensively investigate this problem from both theoretical and practical standpoints to unveil the special characteristics of SVM overfitting. We found that disease diagnosis under an SVM classifier would inevitably encounter overfitting under a Gaussian kernel because of the large data variations generated from high-throughput profiling technologies. Furthermore, we propose a novel sparse-coding kernel approach to overcome SVM overfitting in disease diagnosis. Unlike traditional ad-hoc parametric tuning approaches, it not only robustly conquers the overfitting problem, but also achieves good diagnostic accuracy. To our knowledge, it is the first rigorous method proposed to overcome SVM overfitting. Finally, we propose a novel biomarker discovery algorithm: Gene-Switch-Marker (GSM) to capture meaningful biomarkers by taking advantage of SVM overfitting on single genes. Libertas Academica 2014-12-09 /pmc/articles/PMC4264614/ /pubmed/25574125 http://dx.doi.org/10.4137/CIN.S13875 Text en © 2014 the author(s), publisher and licensee Libertas Academica Ltd. This is an open-access article distributed under the terms of the Creative Commons CC-BY-NC 3.0 License.
spellingShingle Review
Han, Henry
Jiang, Xiaoqian
Overcome Support Vector Machine Diagnosis Overfitting
title Overcome Support Vector Machine Diagnosis Overfitting
title_full Overcome Support Vector Machine Diagnosis Overfitting
title_fullStr Overcome Support Vector Machine Diagnosis Overfitting
title_full_unstemmed Overcome Support Vector Machine Diagnosis Overfitting
title_short Overcome Support Vector Machine Diagnosis Overfitting
title_sort overcome support vector machine diagnosis overfitting
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4264614/
https://www.ncbi.nlm.nih.gov/pubmed/25574125
http://dx.doi.org/10.4137/CIN.S13875
work_keys_str_mv AT hanhenry overcomesupportvectormachinediagnosisoverfitting
AT jiangxiaoqian overcomesupportvectormachinediagnosisoverfitting