Cargando…

On the impact of model selection on predictor identification and parameter inference

We assessed the ability of several penalized regression methods for linear and logistic models to identify outcome-associated predictors and the impact of predictor selection on parameter inference for practical sample sizes. We studied effect estimates obtained directly from penalized methods (Algo...

Descripción completa

Detalles Bibliográficos
Autores principales:	Pfeiffer, Ruth M., Redd, Andrew, Carroll, Raymond J.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer Berlin Heidelberg 2016
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5480098/ https://www.ncbi.nlm.nih.gov/pubmed/28690368 http://dx.doi.org/10.1007/s00180-016-0690-2

_version_	1783245234612731904
author	Pfeiffer, Ruth M. Redd, Andrew Carroll, Raymond J.
author_facet	Pfeiffer, Ruth M. Redd, Andrew Carroll, Raymond J.
author_sort	Pfeiffer, Ruth M.
collection	PubMed
description	We assessed the ability of several penalized regression methods for linear and logistic models to identify outcome-associated predictors and the impact of predictor selection on parameter inference for practical sample sizes. We studied effect estimates obtained directly from penalized methods (Algorithm 1), or by refitting selected predictors with standard regression (Algorithm 2). For linear models, penalized linear regression, elastic net, smoothly clipped absolute deviation (SCAD), least angle regression and LASSO had a low false negative (FN) predictor selection rates but false positive (FP) rates above 20 % for all sample and effect sizes. Partial least squares regression had few FPs but many FNs. Only relaxo had low FP and FN rates. For logistic models, LASSO and penalized logistic regression had many FPs and few FNs for all sample and effect sizes. SCAD and adaptive logistic regression had low or moderate FP rates but many FNs. 95 % confidence interval coverage of predictors with null effects was approximately 100 % for Algorithm 1 for all methods, and 95 % for Algorithm 2 for large sample and effect sizes. Coverage was low only for penalized partial least squares (linear regression). For outcome-associated predictors, coverage was close to 95 % for Algorithm 2 for large sample and effect sizes for all methods except penalized partial least squares and penalized logistic regression. Coverage was sub-nominal for Algorithm 1. In conclusion, many methods performed comparably, and while Algorithm 2 is preferred to Algorithm 1 for estimation, it yields valid inference only for large effect and sample sizes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s00180-016-0690-2) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5480098
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Springer Berlin Heidelberg
record_format	MEDLINE/PubMed
spelling	pubmed-54800982017-07-06 On the impact of model selection on predictor identification and parameter inference Pfeiffer, Ruth M. Redd, Andrew Carroll, Raymond J. Comput Stat Original Paper We assessed the ability of several penalized regression methods for linear and logistic models to identify outcome-associated predictors and the impact of predictor selection on parameter inference for practical sample sizes. We studied effect estimates obtained directly from penalized methods (Algorithm 1), or by refitting selected predictors with standard regression (Algorithm 2). For linear models, penalized linear regression, elastic net, smoothly clipped absolute deviation (SCAD), least angle regression and LASSO had a low false negative (FN) predictor selection rates but false positive (FP) rates above 20 % for all sample and effect sizes. Partial least squares regression had few FPs but many FNs. Only relaxo had low FP and FN rates. For logistic models, LASSO and penalized logistic regression had many FPs and few FNs for all sample and effect sizes. SCAD and adaptive logistic regression had low or moderate FP rates but many FNs. 95 % confidence interval coverage of predictors with null effects was approximately 100 % for Algorithm 1 for all methods, and 95 % for Algorithm 2 for large sample and effect sizes. Coverage was low only for penalized partial least squares (linear regression). For outcome-associated predictors, coverage was close to 95 % for Algorithm 2 for large sample and effect sizes for all methods except penalized partial least squares and penalized logistic regression. Coverage was sub-nominal for Algorithm 1. In conclusion, many methods performed comparably, and while Algorithm 2 is preferred to Algorithm 1 for estimation, it yields valid inference only for large effect and sample sizes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s00180-016-0690-2) contains supplementary material, which is available to authorized users. Springer Berlin Heidelberg 2016-10-22 2017 /pmc/articles/PMC5480098/ /pubmed/28690368 http://dx.doi.org/10.1007/s00180-016-0690-2 Text en © The Author(s) 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle	Original Paper Pfeiffer, Ruth M. Redd, Andrew Carroll, Raymond J. On the impact of model selection on predictor identification and parameter inference
title	On the impact of model selection on predictor identification and parameter inference
title_full	On the impact of model selection on predictor identification and parameter inference
title_fullStr	On the impact of model selection on predictor identification and parameter inference
title_full_unstemmed	On the impact of model selection on predictor identification and parameter inference
title_short	On the impact of model selection on predictor identification and parameter inference
title_sort	on the impact of model selection on predictor identification and parameter inference
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5480098/ https://www.ncbi.nlm.nih.gov/pubmed/28690368 http://dx.doi.org/10.1007/s00180-016-0690-2
work_keys_str_mv	AT pfeifferruthm ontheimpactofmodelselectiononpredictoridentificationandparameterinference AT reddandrew ontheimpactofmodelselectiononpredictoridentificationandparameterinference AT carrollraymondj ontheimpactofmodelselectiononpredictoridentificationandparameterinference

On the impact of model selection on predictor identification and parameter inference

Ejemplares similares