Cargando…

Supervised group Lasso with applications to microarray data analysis

BACKGROUND: A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of co-regulated genes which tend to h...

Descripción completa

Detalles Bibliográficos
Autores principales: Ma, Shuangge, Song, Xiao, Huang, Jian
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1821041/
https://www.ncbi.nlm.nih.gov/pubmed/17316436
http://dx.doi.org/10.1186/1471-2105-8-60
_version_ 1782132674810347520
author Ma, Shuangge
Song, Xiao
Huang, Jian
author_facet Ma, Shuangge
Song, Xiao
Huang, Jian
author_sort Ma, Shuangge
collection PubMed
description BACKGROUND: A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of co-regulated genes which tend to have coordinated functions. However, most available statistical methods for gene selection do not take into consideration the cluster structure. RESULTS: We propose a supervised group Lasso approach that takes into account the cluster structure in gene expression data for gene selection and predictive model building. For gene expression data without biological cluster information, we first divide genes into clusters using the K-means approach and determine the optimal number of clusters using the Gap method. The supervised group Lasso consists of two steps. In the first step, we identify important genes within each cluster using the Lasso method. In the second step, we select important clusters using the group Lasso. Tuning parameters are determined using V-fold cross validation at both steps to allow for further flexibility. Prediction performance is evaluated using leave-one-out cross validation. We apply the proposed method to disease classification and survival analysis with microarray data. CONCLUSION: We analyze four microarray data sets using the proposed approach: two cancer data sets with binary cancer occurrence as outcomes and two lymphoma data sets with survival outcomes. The results show that the proposed approach is capable of identifying a small number of influential gene clusters and important genes within those clusters, and has better prediction performance than existing methods.
format Text
id pubmed-1821041
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-18210412007-03-14 Supervised group Lasso with applications to microarray data analysis Ma, Shuangge Song, Xiao Huang, Jian BMC Bioinformatics Methodology Article BACKGROUND: A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of co-regulated genes which tend to have coordinated functions. However, most available statistical methods for gene selection do not take into consideration the cluster structure. RESULTS: We propose a supervised group Lasso approach that takes into account the cluster structure in gene expression data for gene selection and predictive model building. For gene expression data without biological cluster information, we first divide genes into clusters using the K-means approach and determine the optimal number of clusters using the Gap method. The supervised group Lasso consists of two steps. In the first step, we identify important genes within each cluster using the Lasso method. In the second step, we select important clusters using the group Lasso. Tuning parameters are determined using V-fold cross validation at both steps to allow for further flexibility. Prediction performance is evaluated using leave-one-out cross validation. We apply the proposed method to disease classification and survival analysis with microarray data. CONCLUSION: We analyze four microarray data sets using the proposed approach: two cancer data sets with binary cancer occurrence as outcomes and two lymphoma data sets with survival outcomes. The results show that the proposed approach is capable of identifying a small number of influential gene clusters and important genes within those clusters, and has better prediction performance than existing methods. BioMed Central 2007-02-22 /pmc/articles/PMC1821041/ /pubmed/17316436 http://dx.doi.org/10.1186/1471-2105-8-60 Text en Copyright © 2007 Ma et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Ma, Shuangge
Song, Xiao
Huang, Jian
Supervised group Lasso with applications to microarray data analysis
title Supervised group Lasso with applications to microarray data analysis
title_full Supervised group Lasso with applications to microarray data analysis
title_fullStr Supervised group Lasso with applications to microarray data analysis
title_full_unstemmed Supervised group Lasso with applications to microarray data analysis
title_short Supervised group Lasso with applications to microarray data analysis
title_sort supervised group lasso with applications to microarray data analysis
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1821041/
https://www.ncbi.nlm.nih.gov/pubmed/17316436
http://dx.doi.org/10.1186/1471-2105-8-60
work_keys_str_mv AT mashuangge supervisedgrouplassowithapplicationstomicroarraydataanalysis
AT songxiao supervisedgrouplassowithapplicationstomicroarraydataanalysis
AT huangjian supervisedgrouplassowithapplicationstomicroarraydataanalysis