Cargando…

Sparse logistic regression with a L(1/2) penalty for gene selection in cancer classification

BACKGROUND: Microarray technology is widely used in cancer diagnosis. Successfully identifying gene biomarkers will significantly help to classify different cancer types and improve the prediction accuracy. The regularization approach is one of the effective methods for gene selection in microarray...

Descripción completa

Detalles Bibliográficos
Autores principales:	Liang, Yong, Liu, Cheng, Luan, Xin-Ze, Leung, Kwong-Sak, Chan, Tak-Ming, Xu, Zong-Ben, Zhang, Hai
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3718705/ https://www.ncbi.nlm.nih.gov/pubmed/23777239 http://dx.doi.org/10.1186/1471-2105-14-198

_version_	1782277807684976640
author	Liang, Yong Liu, Cheng Luan, Xin-Ze Leung, Kwong-Sak Chan, Tak-Ming Xu, Zong-Ben Zhang, Hai
author_facet	Liang, Yong Liu, Cheng Luan, Xin-Ze Leung, Kwong-Sak Chan, Tak-Ming Xu, Zong-Ben Zhang, Hai
author_sort	Liang, Yong
collection	PubMed
description	BACKGROUND: Microarray technology is widely used in cancer diagnosis. Successfully identifying gene biomarkers will significantly help to classify different cancer types and improve the prediction accuracy. The regularization approach is one of the effective methods for gene selection in microarray data, which generally contain a large number of genes and have a small number of samples. In recent years, various approaches have been developed for gene selection of microarray data. Generally, they are divided into three categories: filter, wrapper and embedded methods. Regularization methods are an important embedded technique and perform both continuous shrinkage and automatic gene selection simultaneously. Recently, there is growing interest in applying the regularization techniques in gene selection. The popular regularization technique is Lasso (L(1)), and many L(1) type regularization terms have been proposed in the recent years. Theoretically, the Lq type regularization with the lower value of q would lead to better solutions with more sparsity. Moreover, the L(1/2) regularization can be taken as a representative of Lq (0 <q < 1) regularizations and has been demonstrated many attractive properties. RESULTS: In this work, we investigate a sparse logistic regression with the L(1/2) penalty for gene selection in cancer classification problems, and propose a coordinate descent algorithm with a new univariate half thresholding operator to solve the L(1/2) penalized logistic regression. Experimental results on artificial and microarray data demonstrate the effectiveness of our proposed approach compared with other regularization methods. Especially, for 4 publicly available gene expression datasets, the L(1/2) regularization method achieved its success using only about 2 to 14 predictors (genes), compared to about 6 to 38 genes for ordinary L(1) and elastic net regularization approaches. CONCLUSIONS: From our evaluations, it is clear that the sparse logistic regression with the L(1/2) penalty achieves higher classification accuracy than those of ordinary L(1) and elastic net regularization approaches, while fewer but informative genes are selected. This is an important consideration for screening and diagnostic applications, where the goal is often to develop an accurate test using as few features as possible in order to control cost. Therefore, the sparse logistic regression with the L(1/2) penalty is effective technique for gene selection in real classification problems.
format	Online Article Text
id	pubmed-3718705
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-37187052013-07-25 Sparse logistic regression with a L(1/2) penalty for gene selection in cancer classification Liang, Yong Liu, Cheng Luan, Xin-Ze Leung, Kwong-Sak Chan, Tak-Ming Xu, Zong-Ben Zhang, Hai BMC Bioinformatics Research Article BACKGROUND: Microarray technology is widely used in cancer diagnosis. Successfully identifying gene biomarkers will significantly help to classify different cancer types and improve the prediction accuracy. The regularization approach is one of the effective methods for gene selection in microarray data, which generally contain a large number of genes and have a small number of samples. In recent years, various approaches have been developed for gene selection of microarray data. Generally, they are divided into three categories: filter, wrapper and embedded methods. Regularization methods are an important embedded technique and perform both continuous shrinkage and automatic gene selection simultaneously. Recently, there is growing interest in applying the regularization techniques in gene selection. The popular regularization technique is Lasso (L(1)), and many L(1) type regularization terms have been proposed in the recent years. Theoretically, the Lq type regularization with the lower value of q would lead to better solutions with more sparsity. Moreover, the L(1/2) regularization can be taken as a representative of Lq (0 <q < 1) regularizations and has been demonstrated many attractive properties. RESULTS: In this work, we investigate a sparse logistic regression with the L(1/2) penalty for gene selection in cancer classification problems, and propose a coordinate descent algorithm with a new univariate half thresholding operator to solve the L(1/2) penalized logistic regression. Experimental results on artificial and microarray data demonstrate the effectiveness of our proposed approach compared with other regularization methods. Especially, for 4 publicly available gene expression datasets, the L(1/2) regularization method achieved its success using only about 2 to 14 predictors (genes), compared to about 6 to 38 genes for ordinary L(1) and elastic net regularization approaches. CONCLUSIONS: From our evaluations, it is clear that the sparse logistic regression with the L(1/2) penalty achieves higher classification accuracy than those of ordinary L(1) and elastic net regularization approaches, while fewer but informative genes are selected. This is an important consideration for screening and diagnostic applications, where the goal is often to develop an accurate test using as few features as possible in order to control cost. Therefore, the sparse logistic regression with the L(1/2) penalty is effective technique for gene selection in real classification problems. BioMed Central 2013-06-19 /pmc/articles/PMC3718705/ /pubmed/23777239 http://dx.doi.org/10.1186/1471-2105-14-198 Text en Copyright © 2013 Liang et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Liang, Yong Liu, Cheng Luan, Xin-Ze Leung, Kwong-Sak Chan, Tak-Ming Xu, Zong-Ben Zhang, Hai Sparse logistic regression with a L(1/2) penalty for gene selection in cancer classification
title	Sparse logistic regression with a L(1/2) penalty for gene selection in cancer classification
title_full	Sparse logistic regression with a L(1/2) penalty for gene selection in cancer classification
title_fullStr	Sparse logistic regression with a L(1/2) penalty for gene selection in cancer classification
title_full_unstemmed	Sparse logistic regression with a L(1/2) penalty for gene selection in cancer classification
title_short	Sparse logistic regression with a L(1/2) penalty for gene selection in cancer classification
title_sort	sparse logistic regression with a l(1/2) penalty for gene selection in cancer classification
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3718705/ https://www.ncbi.nlm.nih.gov/pubmed/23777239 http://dx.doi.org/10.1186/1471-2105-14-198
work_keys_str_mv	AT liangyong sparselogisticregressionwithal12penaltyforgeneselectionincancerclassification AT liucheng sparselogisticregressionwithal12penaltyforgeneselectionincancerclassification AT luanxinze sparselogisticregressionwithal12penaltyforgeneselectionincancerclassification AT leungkwongsak sparselogisticregressionwithal12penaltyforgeneselectionincancerclassification AT chantakming sparselogisticregressionwithal12penaltyforgeneselectionincancerclassification AT xuzongben sparselogisticregressionwithal12penaltyforgeneselectionincancerclassification AT zhanghai sparselogisticregressionwithal12penaltyforgeneselectionincancerclassification

Sparse logistic regression with a L(1/2) penalty for gene selection in cancer classification

Ejemplares similares