Cargando…

Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data

BACKGROUND: Like microarray-based investigations, high-throughput proteomics techniques require machine learning algorithms to identify biomarkers that are informative for biological classification problems. Feature selection and classification algorithms need to be robust to noise and outliers in t...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Xuegong, Lu, Xin, Shi, Qian, Xu, Xiu-qin, Leung, Hon-chiu E, Harris, Lyndsay N, Iglehart, James D, Miron, Alexander, Liu, Jun S, Wong, Wing H
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1456993/ https://www.ncbi.nlm.nih.gov/pubmed/16606446 http://dx.doi.org/10.1186/1471-2105-7-197

_version_	1782127418914373632
author	Zhang, Xuegong Lu, Xin Shi, Qian Xu, Xiu-qin Leung, Hon-chiu E Harris, Lyndsay N Iglehart, James D Miron, Alexander Liu, Jun S Wong, Wing H
author_facet	Zhang, Xuegong Lu, Xin Shi, Qian Xu, Xiu-qin Leung, Hon-chiu E Harris, Lyndsay N Iglehart, James D Miron, Alexander Liu, Jun S Wong, Wing H
author_sort	Zhang, Xuegong
collection	PubMed
description	BACKGROUND: Like microarray-based investigations, high-throughput proteomics techniques require machine learning algorithms to identify biomarkers that are informative for biological classification problems. Feature selection and classification algorithms need to be robust to noise and outliers in the data. RESULTS: We developed a recursive support vector machine (R-SVM) algorithm to select important genes/biomarkers for the classification of noisy data. We compared its performance to a similar, state-of-the-art method (SVM recursive feature elimination or SVM-RFE), paying special attention to the ability of recovering the true informative genes/biomarkers and the robustness to outliers in the data. Simulation experiments show that a 5 %-~20 % improvement over SVM-RFE can be achieved regard to these properties. The SVM-based methods are also compared with a conventional univariate method and their respective strengths and weaknesses are discussed. R-SVM was applied to two sets of SELDI-TOF-MS proteomics data, one from a human breast cancer study and the other from a study on rat liver cirrhosis. Important biomarkers found by the algorithm were validated by follow-up biological experiments. CONCLUSION: The proposed R-SVM method is suitable for analyzing noisy high-throughput proteomics and microarray data and it outperforms SVM-RFE in the robustness to noise and in the ability to recover informative features. The multivariate SVM-based method outperforms the univariate method in the classification performance, but univariate methods can reveal more of the differentially expressed features especially when there are correlations between the features.
format	Text
id	pubmed-1456993
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-14569932006-05-04 Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data Zhang, Xuegong Lu, Xin Shi, Qian Xu, Xiu-qin Leung, Hon-chiu E Harris, Lyndsay N Iglehart, James D Miron, Alexander Liu, Jun S Wong, Wing H BMC Bioinformatics Methodology Article BACKGROUND: Like microarray-based investigations, high-throughput proteomics techniques require machine learning algorithms to identify biomarkers that are informative for biological classification problems. Feature selection and classification algorithms need to be robust to noise and outliers in the data. RESULTS: We developed a recursive support vector machine (R-SVM) algorithm to select important genes/biomarkers for the classification of noisy data. We compared its performance to a similar, state-of-the-art method (SVM recursive feature elimination or SVM-RFE), paying special attention to the ability of recovering the true informative genes/biomarkers and the robustness to outliers in the data. Simulation experiments show that a 5 %-~20 % improvement over SVM-RFE can be achieved regard to these properties. The SVM-based methods are also compared with a conventional univariate method and their respective strengths and weaknesses are discussed. R-SVM was applied to two sets of SELDI-TOF-MS proteomics data, one from a human breast cancer study and the other from a study on rat liver cirrhosis. Important biomarkers found by the algorithm were validated by follow-up biological experiments. CONCLUSION: The proposed R-SVM method is suitable for analyzing noisy high-throughput proteomics and microarray data and it outperforms SVM-RFE in the robustness to noise and in the ability to recover informative features. The multivariate SVM-based method outperforms the univariate method in the classification performance, but univariate methods can reveal more of the differentially expressed features especially when there are correlations between the features. BioMed Central 2006-04-10 /pmc/articles/PMC1456993/ /pubmed/16606446 http://dx.doi.org/10.1186/1471-2105-7-197 Text en Copyright © 2006 Zhang et al; licensee BioMed Central Ltd.
spellingShingle	Methodology Article Zhang, Xuegong Lu, Xin Shi, Qian Xu, Xiu-qin Leung, Hon-chiu E Harris, Lyndsay N Iglehart, James D Miron, Alexander Liu, Jun S Wong, Wing H Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data
title	Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data
title_full	Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data
title_fullStr	Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data
title_full_unstemmed	Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data
title_short	Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data
title_sort	recursive svm feature selection and sample classification for mass-spectrometry and microarray data
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1456993/ https://www.ncbi.nlm.nih.gov/pubmed/16606446 http://dx.doi.org/10.1186/1471-2105-7-197
work_keys_str_mv	AT zhangxuegong recursivesvmfeatureselectionandsampleclassificationformassspectrometryandmicroarraydata AT luxin recursivesvmfeatureselectionandsampleclassificationformassspectrometryandmicroarraydata AT shiqian recursivesvmfeatureselectionandsampleclassificationformassspectrometryandmicroarraydata AT xuxiuqin recursivesvmfeatureselectionandsampleclassificationformassspectrometryandmicroarraydata AT leunghonchiue recursivesvmfeatureselectionandsampleclassificationformassspectrometryandmicroarraydata AT harrislyndsayn recursivesvmfeatureselectionandsampleclassificationformassspectrometryandmicroarraydata AT iglehartjamesd recursivesvmfeatureselectionandsampleclassificationformassspectrometryandmicroarraydata AT mironalexander recursivesvmfeatureselectionandsampleclassificationformassspectrometryandmicroarraydata AT liujuns recursivesvmfeatureselectionandsampleclassificationformassspectrometryandmicroarraydata AT wongwingh recursivesvmfeatureselectionandsampleclassificationformassspectrometryandmicroarraydata

Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data

Ejemplares similares