Cargando…

Network-Constrained Group Lasso for High-Dimensional Multinomial Classification with Application to Cancer Subtype Prediction

Classic multinomial logit model, commonly used in multiclass regression problem, is restricted to few predictors and does not take into account the relationship among variables. It has limited use for genomic data, where the number of genomic features far exceeds the sample size. Genomic features su...

Descripción completa

Detalles Bibliográficos
Autores principales: Tian, Xinyu, Wang, Xuefeng, Chen, Jun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Libertas Academica 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4295837/
https://www.ncbi.nlm.nih.gov/pubmed/25635165
http://dx.doi.org/10.4137/CIN.S17686
_version_ 1782352884962164736
author Tian, Xinyu
Wang, Xuefeng
Chen, Jun
author_facet Tian, Xinyu
Wang, Xuefeng
Chen, Jun
author_sort Tian, Xinyu
collection PubMed
description Classic multinomial logit model, commonly used in multiclass regression problem, is restricted to few predictors and does not take into account the relationship among variables. It has limited use for genomic data, where the number of genomic features far exceeds the sample size. Genomic features such as gene expressions are usually related by an underlying biological network. Efficient use of the network information is important to improve classification performance as well as the biological interpretability. We proposed a multinomial logit model that is capable of addressing both the high dimensionality of predictors and the underlying network information. Group lasso was used to induce model sparsity, and a network-constraint was imposed to induce the smoothness of the coefficients with respect to the underlying network structure. To deal with the non-smoothness of the objective function in optimization, we developed a proximal gradient algorithm for efficient computation. The proposed model was compared to models with no prior structure information in both simulations and a problem of cancer subtype prediction with real TCGA (the cancer genome atlas) gene expression data. The network-constrained mode outperformed the traditional ones in both cases.
format Online
Article
Text
id pubmed-4295837
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-42958372015-01-29 Network-Constrained Group Lasso for High-Dimensional Multinomial Classification with Application to Cancer Subtype Prediction Tian, Xinyu Wang, Xuefeng Chen, Jun Cancer Inform Original Research Classic multinomial logit model, commonly used in multiclass regression problem, is restricted to few predictors and does not take into account the relationship among variables. It has limited use for genomic data, where the number of genomic features far exceeds the sample size. Genomic features such as gene expressions are usually related by an underlying biological network. Efficient use of the network information is important to improve classification performance as well as the biological interpretability. We proposed a multinomial logit model that is capable of addressing both the high dimensionality of predictors and the underlying network information. Group lasso was used to induce model sparsity, and a network-constraint was imposed to induce the smoothness of the coefficients with respect to the underlying network structure. To deal with the non-smoothness of the objective function in optimization, we developed a proximal gradient algorithm for efficient computation. The proposed model was compared to models with no prior structure information in both simulations and a problem of cancer subtype prediction with real TCGA (the cancer genome atlas) gene expression data. The network-constrained mode outperformed the traditional ones in both cases. Libertas Academica 2015-01-12 /pmc/articles/PMC4295837/ /pubmed/25635165 http://dx.doi.org/10.4137/CIN.S17686 Text en © 2015 the author(s), publisher and licensee Libertas Academica Ltd. This is an open-access article distributed under the terms of the Creative Commons CC-BY-NC 3.0 License.
spellingShingle Original Research
Tian, Xinyu
Wang, Xuefeng
Chen, Jun
Network-Constrained Group Lasso for High-Dimensional Multinomial Classification with Application to Cancer Subtype Prediction
title Network-Constrained Group Lasso for High-Dimensional Multinomial Classification with Application to Cancer Subtype Prediction
title_full Network-Constrained Group Lasso for High-Dimensional Multinomial Classification with Application to Cancer Subtype Prediction
title_fullStr Network-Constrained Group Lasso for High-Dimensional Multinomial Classification with Application to Cancer Subtype Prediction
title_full_unstemmed Network-Constrained Group Lasso for High-Dimensional Multinomial Classification with Application to Cancer Subtype Prediction
title_short Network-Constrained Group Lasso for High-Dimensional Multinomial Classification with Application to Cancer Subtype Prediction
title_sort network-constrained group lasso for high-dimensional multinomial classification with application to cancer subtype prediction
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4295837/
https://www.ncbi.nlm.nih.gov/pubmed/25635165
http://dx.doi.org/10.4137/CIN.S17686
work_keys_str_mv AT tianxinyu networkconstrainedgrouplassoforhighdimensionalmultinomialclassificationwithapplicationtocancersubtypeprediction
AT wangxuefeng networkconstrainedgrouplassoforhighdimensionalmultinomialclassificationwithapplicationtocancersubtypeprediction
AT chenjun networkconstrainedgrouplassoforhighdimensionalmultinomialclassificationwithapplicationtocancersubtypeprediction