Cargando…

Bayesian Hyper-LASSO Classification for Feature Selection with Application to Endometrial Cancer RNA-seq Data

Feature selection is demanded in many modern scientific research problems that use high-dimensional data. A typical example is to identify gene signatures that are related to a certain disease from high-dimensional gene expression data. The expression of genes may have grouping structures, for examp...

Descripción completa

Detalles Bibliográficos
Autores principales: Jiang, Lai, Greenwood, Celia M. T., Yao, Weixin, Li, Longhai
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7297975/
https://www.ncbi.nlm.nih.gov/pubmed/32546735
http://dx.doi.org/10.1038/s41598-020-66466-z
_version_ 1783547116579192832
author Jiang, Lai
Greenwood, Celia M. T.
Yao, Weixin
Li, Longhai
author_facet Jiang, Lai
Greenwood, Celia M. T.
Yao, Weixin
Li, Longhai
author_sort Jiang, Lai
collection PubMed
description Feature selection is demanded in many modern scientific research problems that use high-dimensional data. A typical example is to identify gene signatures that are related to a certain disease from high-dimensional gene expression data. The expression of genes may have grouping structures, for example, a group of co-regulated genes that have similar biological functions tend to have similar expressions. Thus it is preferable to take the grouping structure into consideration to select features. In this paper, we propose a Bayesian Robit regression method with Hyper-LASSO priors (shortened by BayesHL) for feature selection in high dimensional genomic data with grouping structure. The main features of BayesHL include that it discards more aggressively unrelated features than LASSO, and it makes feature selection within groups automatically without a pre-specified grouping structure. We apply BayesHL in gene expression analysis to identify subsets of genes that contribute to the 5-year survival outcome of endometrial cancer (EC) patients. Results show that BayesHL outperforms alternative methods (including LASSO, group LASSO, supervised group LASSO, penalized logistic regression, random forest, neural network, XGBoost and knockoff) in terms of predictive power, sparsity and the ability to uncover grouping structure, and provides insight into the mechanisms of multiple genetic pathways leading to differentiated EC survival outcome.
format Online
Article
Text
id pubmed-7297975
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-72979752020-06-18 Bayesian Hyper-LASSO Classification for Feature Selection with Application to Endometrial Cancer RNA-seq Data Jiang, Lai Greenwood, Celia M. T. Yao, Weixin Li, Longhai Sci Rep Article Feature selection is demanded in many modern scientific research problems that use high-dimensional data. A typical example is to identify gene signatures that are related to a certain disease from high-dimensional gene expression data. The expression of genes may have grouping structures, for example, a group of co-regulated genes that have similar biological functions tend to have similar expressions. Thus it is preferable to take the grouping structure into consideration to select features. In this paper, we propose a Bayesian Robit regression method with Hyper-LASSO priors (shortened by BayesHL) for feature selection in high dimensional genomic data with grouping structure. The main features of BayesHL include that it discards more aggressively unrelated features than LASSO, and it makes feature selection within groups automatically without a pre-specified grouping structure. We apply BayesHL in gene expression analysis to identify subsets of genes that contribute to the 5-year survival outcome of endometrial cancer (EC) patients. Results show that BayesHL outperforms alternative methods (including LASSO, group LASSO, supervised group LASSO, penalized logistic regression, random forest, neural network, XGBoost and knockoff) in terms of predictive power, sparsity and the ability to uncover grouping structure, and provides insight into the mechanisms of multiple genetic pathways leading to differentiated EC survival outcome. Nature Publishing Group UK 2020-06-16 /pmc/articles/PMC7297975/ /pubmed/32546735 http://dx.doi.org/10.1038/s41598-020-66466-z Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Jiang, Lai
Greenwood, Celia M. T.
Yao, Weixin
Li, Longhai
Bayesian Hyper-LASSO Classification for Feature Selection with Application to Endometrial Cancer RNA-seq Data
title Bayesian Hyper-LASSO Classification for Feature Selection with Application to Endometrial Cancer RNA-seq Data
title_full Bayesian Hyper-LASSO Classification for Feature Selection with Application to Endometrial Cancer RNA-seq Data
title_fullStr Bayesian Hyper-LASSO Classification for Feature Selection with Application to Endometrial Cancer RNA-seq Data
title_full_unstemmed Bayesian Hyper-LASSO Classification for Feature Selection with Application to Endometrial Cancer RNA-seq Data
title_short Bayesian Hyper-LASSO Classification for Feature Selection with Application to Endometrial Cancer RNA-seq Data
title_sort bayesian hyper-lasso classification for feature selection with application to endometrial cancer rna-seq data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7297975/
https://www.ncbi.nlm.nih.gov/pubmed/32546735
http://dx.doi.org/10.1038/s41598-020-66466-z
work_keys_str_mv AT jianglai bayesianhyperlassoclassificationforfeatureselectionwithapplicationtoendometrialcancerrnaseqdata
AT greenwoodceliamt bayesianhyperlassoclassificationforfeatureselectionwithapplicationtoendometrialcancerrnaseqdata
AT yaoweixin bayesianhyperlassoclassificationforfeatureselectionwithapplicationtoendometrialcancerrnaseqdata
AT lilonghai bayesianhyperlassoclassificationforfeatureselectionwithapplicationtoendometrialcancerrnaseqdata