Cargando…

eNetXplorer: an R package for the quantitative exploration of elastic net families for generalized linear models

BACKGROUND: Regularized generalized linear models (GLMs) are popular regression methods in bioinformatics, particularly useful in scenarios with fewer observations than parameters/features or when many of the features are correlated. In both ridge and lasso regularization, feature shrinkage is contr...

Descripción completa

Detalles Bibliográficos
Autores principales: Candia, Julián, Tsang, John S
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6469092/
https://www.ncbi.nlm.nih.gov/pubmed/30991955
http://dx.doi.org/10.1186/s12859-019-2778-5
_version_ 1783411573683912704
author Candia, Julián
Tsang, John S
author_facet Candia, Julián
Tsang, John S
author_sort Candia, Julián
collection PubMed
description BACKGROUND: Regularized generalized linear models (GLMs) are popular regression methods in bioinformatics, particularly useful in scenarios with fewer observations than parameters/features or when many of the features are correlated. In both ridge and lasso regularization, feature shrinkage is controlled by a penalty parameter λ. The elastic net introduces a mixing parameter α to tune the shrinkage continuously from ridge to lasso. Selecting α objectively and determining which features contributed significantly to prediction after model fitting remain a practical challenge given the paucity of available software to evaluate performance and statistical significance. RESULTS: eNetXplorer builds on top of glmnet to address the above issues for linear (Gaussian), binomial (logistic), and multinomial GLMs. It provides new functionalities to empower practical applications by using a cross validation framework that assesses the predictive performance and statistical significance of a family of elastic net models (as α is varied) and of the corresponding features that contribute to prediction. The user can select which quality metrics to use to quantify the concordance between predicted and observed values, with defaults provided for each GLM. Statistical significance for each model (as defined by α) is determined based on comparison to a set of null models generated by random permutations of the response; the same permutation-based approach is used to evaluate the significance of individual features. In the analysis of large and complex biological datasets, such as transcriptomic and proteomic data, eNetXplorer provides summary statistics, output tables, and visualizations to help assess which subset(s) of features have predictive value for a set of response measurements, and to what extent those subset(s) of features can be expanded or reduced via regularization. CONCLUSIONS: This package presents a framework and software for exploratory data analysis and visualization. By making regularized GLMs more accessible and interpretable, eNetXplorer guides the process to generate hypotheses based on features significantly associated with biological phenotypes of interest, e.g. to identify biomarkers for therapeutic responsiveness. eNetXplorer is also generally applicable to any research area that may benefit from predictive modeling and feature identification using regularized GLMs. The package is available under GPL-3 license at the CRAN repository, https://CRAN.R-project.org/package=eNetXplorer. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2778-5) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6469092
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-64690922019-04-23 eNetXplorer: an R package for the quantitative exploration of elastic net families for generalized linear models Candia, Julián Tsang, John S BMC Bioinformatics Software BACKGROUND: Regularized generalized linear models (GLMs) are popular regression methods in bioinformatics, particularly useful in scenarios with fewer observations than parameters/features or when many of the features are correlated. In both ridge and lasso regularization, feature shrinkage is controlled by a penalty parameter λ. The elastic net introduces a mixing parameter α to tune the shrinkage continuously from ridge to lasso. Selecting α objectively and determining which features contributed significantly to prediction after model fitting remain a practical challenge given the paucity of available software to evaluate performance and statistical significance. RESULTS: eNetXplorer builds on top of glmnet to address the above issues for linear (Gaussian), binomial (logistic), and multinomial GLMs. It provides new functionalities to empower practical applications by using a cross validation framework that assesses the predictive performance and statistical significance of a family of elastic net models (as α is varied) and of the corresponding features that contribute to prediction. The user can select which quality metrics to use to quantify the concordance between predicted and observed values, with defaults provided for each GLM. Statistical significance for each model (as defined by α) is determined based on comparison to a set of null models generated by random permutations of the response; the same permutation-based approach is used to evaluate the significance of individual features. In the analysis of large and complex biological datasets, such as transcriptomic and proteomic data, eNetXplorer provides summary statistics, output tables, and visualizations to help assess which subset(s) of features have predictive value for a set of response measurements, and to what extent those subset(s) of features can be expanded or reduced via regularization. CONCLUSIONS: This package presents a framework and software for exploratory data analysis and visualization. By making regularized GLMs more accessible and interpretable, eNetXplorer guides the process to generate hypotheses based on features significantly associated with biological phenotypes of interest, e.g. to identify biomarkers for therapeutic responsiveness. eNetXplorer is also generally applicable to any research area that may benefit from predictive modeling and feature identification using regularized GLMs. The package is available under GPL-3 license at the CRAN repository, https://CRAN.R-project.org/package=eNetXplorer. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2778-5) contains supplementary material, which is available to authorized users. BioMed Central 2019-04-16 /pmc/articles/PMC6469092/ /pubmed/30991955 http://dx.doi.org/10.1186/s12859-019-2778-5 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Candia, Julián
Tsang, John S
eNetXplorer: an R package for the quantitative exploration of elastic net families for generalized linear models
title eNetXplorer: an R package for the quantitative exploration of elastic net families for generalized linear models
title_full eNetXplorer: an R package for the quantitative exploration of elastic net families for generalized linear models
title_fullStr eNetXplorer: an R package for the quantitative exploration of elastic net families for generalized linear models
title_full_unstemmed eNetXplorer: an R package for the quantitative exploration of elastic net families for generalized linear models
title_short eNetXplorer: an R package for the quantitative exploration of elastic net families for generalized linear models
title_sort enetxplorer: an r package for the quantitative exploration of elastic net families for generalized linear models
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6469092/
https://www.ncbi.nlm.nih.gov/pubmed/30991955
http://dx.doi.org/10.1186/s12859-019-2778-5
work_keys_str_mv AT candiajulian enetxploreranrpackageforthequantitativeexplorationofelasticnetfamiliesforgeneralizedlinearmodels
AT tsangjohns enetxploreranrpackageforthequantitativeexplorationofelasticnetfamiliesforgeneralizedlinearmodels