Cargando…

Accounting for grouped predictor variables or pathways in high-dimensional penalized Cox regression models

BACKGROUND: The standard lasso penalty and its extensions are commonly used to develop a regularized regression model while selecting candidate predictor variables on a time-to-event outcome in high-dimensional data. However, these selection methods focus on a homogeneous set of variables and do not...

Descripción completa

Detalles Bibliográficos
Autores principales: Belhechmi, Shaima, Bin, Riccardo De, Rotolo, Federico, Michiels, Stefan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7331150/
https://www.ncbi.nlm.nih.gov/pubmed/32615919
http://dx.doi.org/10.1186/s12859-020-03618-y
_version_ 1783553263890595840
author Belhechmi, Shaima
Bin, Riccardo De
Rotolo, Federico
Michiels, Stefan
author_facet Belhechmi, Shaima
Bin, Riccardo De
Rotolo, Federico
Michiels, Stefan
author_sort Belhechmi, Shaima
collection PubMed
description BACKGROUND: The standard lasso penalty and its extensions are commonly used to develop a regularized regression model while selecting candidate predictor variables on a time-to-event outcome in high-dimensional data. However, these selection methods focus on a homogeneous set of variables and do not take into account the case of predictors belonging to functional groups; typically, genomic data can be grouped according to biological pathways or to different types of collected data. Another challenge is that the standard lasso penalisation is known to have a high false discovery rate. RESULTS: We evaluated different penalizations in a Cox model to select grouped variables in order to further penalize variables that, in addition to having a low effect, belong to a group with a low overall effect; and to favor the selection of variables that, in addition to having a large effect, belong to a group with a large overall effect. We considered the case of prespecified and disjoint groups and proposed diverse weights for the adaptive lasso method. In particular we proposed the product Max Single Wald by Single Wald weighting (MSW*SW) which takes into account the information of the group to which it belongs and of this biomarker. Through simulations, we compared the selection and prediction ability of our approach with the standard lasso, the composite Minimax Concave Penalty (cMCP), the group exponential lasso (gel), the Integrative L1-Penalized Regression with Penalty Factors (IPF-Lasso), and the Sparse Group Lasso (SGL) methods. In addition, we illustrated the methods using gene expression data of 614 breast cancer patients. CONCLUSIONS: The adaptive lasso with the MSW*SW weighting method incorporates both the information in the grouping structure and the individual variable. It outperformed the competitors by reducing the false discovery rate without severely increasing the false negative rate.
format Online
Article
Text
id pubmed-7331150
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-73311502020-07-06 Accounting for grouped predictor variables or pathways in high-dimensional penalized Cox regression models Belhechmi, Shaima Bin, Riccardo De Rotolo, Federico Michiels, Stefan BMC Bioinformatics Methodology Article BACKGROUND: The standard lasso penalty and its extensions are commonly used to develop a regularized regression model while selecting candidate predictor variables on a time-to-event outcome in high-dimensional data. However, these selection methods focus on a homogeneous set of variables and do not take into account the case of predictors belonging to functional groups; typically, genomic data can be grouped according to biological pathways or to different types of collected data. Another challenge is that the standard lasso penalisation is known to have a high false discovery rate. RESULTS: We evaluated different penalizations in a Cox model to select grouped variables in order to further penalize variables that, in addition to having a low effect, belong to a group with a low overall effect; and to favor the selection of variables that, in addition to having a large effect, belong to a group with a large overall effect. We considered the case of prespecified and disjoint groups and proposed diverse weights for the adaptive lasso method. In particular we proposed the product Max Single Wald by Single Wald weighting (MSW*SW) which takes into account the information of the group to which it belongs and of this biomarker. Through simulations, we compared the selection and prediction ability of our approach with the standard lasso, the composite Minimax Concave Penalty (cMCP), the group exponential lasso (gel), the Integrative L1-Penalized Regression with Penalty Factors (IPF-Lasso), and the Sparse Group Lasso (SGL) methods. In addition, we illustrated the methods using gene expression data of 614 breast cancer patients. CONCLUSIONS: The adaptive lasso with the MSW*SW weighting method incorporates both the information in the grouping structure and the individual variable. It outperformed the competitors by reducing the false discovery rate without severely increasing the false negative rate. BioMed Central 2020-07-02 /pmc/articles/PMC7331150/ /pubmed/32615919 http://dx.doi.org/10.1186/s12859-020-03618-y Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Belhechmi, Shaima
Bin, Riccardo De
Rotolo, Federico
Michiels, Stefan
Accounting for grouped predictor variables or pathways in high-dimensional penalized Cox regression models
title Accounting for grouped predictor variables or pathways in high-dimensional penalized Cox regression models
title_full Accounting for grouped predictor variables or pathways in high-dimensional penalized Cox regression models
title_fullStr Accounting for grouped predictor variables or pathways in high-dimensional penalized Cox regression models
title_full_unstemmed Accounting for grouped predictor variables or pathways in high-dimensional penalized Cox regression models
title_short Accounting for grouped predictor variables or pathways in high-dimensional penalized Cox regression models
title_sort accounting for grouped predictor variables or pathways in high-dimensional penalized cox regression models
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7331150/
https://www.ncbi.nlm.nih.gov/pubmed/32615919
http://dx.doi.org/10.1186/s12859-020-03618-y
work_keys_str_mv AT belhechmishaima accountingforgroupedpredictorvariablesorpathwaysinhighdimensionalpenalizedcoxregressionmodels
AT binriccardode accountingforgroupedpredictorvariablesorpathwaysinhighdimensionalpenalizedcoxregressionmodels
AT rotolofederico accountingforgroupedpredictorvariablesorpathwaysinhighdimensionalpenalizedcoxregressionmodels
AT michielsstefan accountingforgroupedpredictorvariablesorpathwaysinhighdimensionalpenalizedcoxregressionmodels