Cargando…

Seagull: lasso, group lasso and sparse-group lasso regularization for linear regression models via proximal gradient descent

BACKGROUND: Statistical analyses of biological problems in life sciences often lead to high-dimensional linear models. To solve the corresponding system of equations, penalization approaches are often the methods of choice. They are especially useful in case of multicollinearity, which appears if th...

Descripción completa

Detalles Bibliográficos
Autores principales: Klosa, Jan, Simon, Noah, Westermark, Pål Olof, Liebscher, Volkmar, Wittenburg, Dörte
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7493359/
https://www.ncbi.nlm.nih.gov/pubmed/32933477
http://dx.doi.org/10.1186/s12859-020-03725-w
_version_ 1783582552373592064
author Klosa, Jan
Simon, Noah
Westermark, Pål Olof
Liebscher, Volkmar
Wittenburg, Dörte
author_facet Klosa, Jan
Simon, Noah
Westermark, Pål Olof
Liebscher, Volkmar
Wittenburg, Dörte
author_sort Klosa, Jan
collection PubMed
description BACKGROUND: Statistical analyses of biological problems in life sciences often lead to high-dimensional linear models. To solve the corresponding system of equations, penalization approaches are often the methods of choice. They are especially useful in case of multicollinearity, which appears if the number of explanatory variables exceeds the number of observations or for some biological reason. Then, the model goodness of fit is penalized by some suitable function of interest. Prominent examples are the lasso, group lasso and sparse-group lasso. Here, we offer a fast and numerically cheap implementation of these operators via proximal gradient descent. The grid search for the penalty parameter is realized by warm starts. The step size between consecutive iterations is determined with backtracking line search. Finally, seagull -the R package presented here- produces complete regularization paths. RESULTS: Publicly available high-dimensional methylation data are used to compare seagull to the established R package SGL. The results of both packages enabled a precise prediction of biological age from DNA methylation status. But even though the results of seagull and SGL were very similar (R(2) > 0.99), seagull computed the solution in a fraction of the time needed by SGL. Additionally, seagull enables the incorporation of weights for each penalized feature. CONCLUSIONS: The following operators for linear regression models are available in seagull: lasso, group lasso, sparse-group lasso and Integrative LASSO with Penalty Factors (IPF-lasso). Thus, seagull is a convenient envelope of lasso variants.
format Online
Article
Text
id pubmed-7493359
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-74933592020-09-16 Seagull: lasso, group lasso and sparse-group lasso regularization for linear regression models via proximal gradient descent Klosa, Jan Simon, Noah Westermark, Pål Olof Liebscher, Volkmar Wittenburg, Dörte BMC Bioinformatics Software BACKGROUND: Statistical analyses of biological problems in life sciences often lead to high-dimensional linear models. To solve the corresponding system of equations, penalization approaches are often the methods of choice. They are especially useful in case of multicollinearity, which appears if the number of explanatory variables exceeds the number of observations or for some biological reason. Then, the model goodness of fit is penalized by some suitable function of interest. Prominent examples are the lasso, group lasso and sparse-group lasso. Here, we offer a fast and numerically cheap implementation of these operators via proximal gradient descent. The grid search for the penalty parameter is realized by warm starts. The step size between consecutive iterations is determined with backtracking line search. Finally, seagull -the R package presented here- produces complete regularization paths. RESULTS: Publicly available high-dimensional methylation data are used to compare seagull to the established R package SGL. The results of both packages enabled a precise prediction of biological age from DNA methylation status. But even though the results of seagull and SGL were very similar (R(2) > 0.99), seagull computed the solution in a fraction of the time needed by SGL. Additionally, seagull enables the incorporation of weights for each penalized feature. CONCLUSIONS: The following operators for linear regression models are available in seagull: lasso, group lasso, sparse-group lasso and Integrative LASSO with Penalty Factors (IPF-lasso). Thus, seagull is a convenient envelope of lasso variants. BioMed Central 2020-09-15 /pmc/articles/PMC7493359/ /pubmed/32933477 http://dx.doi.org/10.1186/s12859-020-03725-w Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Klosa, Jan
Simon, Noah
Westermark, Pål Olof
Liebscher, Volkmar
Wittenburg, Dörte
Seagull: lasso, group lasso and sparse-group lasso regularization for linear regression models via proximal gradient descent
title Seagull: lasso, group lasso and sparse-group lasso regularization for linear regression models via proximal gradient descent
title_full Seagull: lasso, group lasso and sparse-group lasso regularization for linear regression models via proximal gradient descent
title_fullStr Seagull: lasso, group lasso and sparse-group lasso regularization for linear regression models via proximal gradient descent
title_full_unstemmed Seagull: lasso, group lasso and sparse-group lasso regularization for linear regression models via proximal gradient descent
title_short Seagull: lasso, group lasso and sparse-group lasso regularization for linear regression models via proximal gradient descent
title_sort seagull: lasso, group lasso and sparse-group lasso regularization for linear regression models via proximal gradient descent
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7493359/
https://www.ncbi.nlm.nih.gov/pubmed/32933477
http://dx.doi.org/10.1186/s12859-020-03725-w
work_keys_str_mv AT klosajan seagulllassogrouplassoandsparsegrouplassoregularizationforlinearregressionmodelsviaproximalgradientdescent
AT simonnoah seagulllassogrouplassoandsparsegrouplassoregularizationforlinearregressionmodelsviaproximalgradientdescent
AT westermarkpalolof seagulllassogrouplassoandsparsegrouplassoregularizationforlinearregressionmodelsviaproximalgradientdescent
AT liebschervolkmar seagulllassogrouplassoandsparsegrouplassoregularizationforlinearregressionmodelsviaproximalgradientdescent
AT wittenburgdorte seagulllassogrouplassoandsparsegrouplassoregularizationforlinearregressionmodelsviaproximalgradientdescent