Cargando…

Review and evaluation of penalised regression methods for risk prediction in low‐dimensional data with few events

Risk prediction models are used to predict a clinical outcome for patients using a set of predictors. We focus on predicting low‐dimensional binary outcomes typically arising in epidemiology, health services and public health research where logistic regression is commonly used. When the number of ev...

Descripción completa

Detalles Bibliográficos
Autores principales: Pavlou, Menelaos, Ambler, Gareth, Seaman, Shaun, De Iorio, Maria, Omar, Rumana Z
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4982098/
https://www.ncbi.nlm.nih.gov/pubmed/26514699
http://dx.doi.org/10.1002/sim.6782
_version_ 1782447712720912384
author Pavlou, Menelaos
Ambler, Gareth
Seaman, Shaun
De Iorio, Maria
Omar, Rumana Z
author_facet Pavlou, Menelaos
Ambler, Gareth
Seaman, Shaun
De Iorio, Maria
Omar, Rumana Z
author_sort Pavlou, Menelaos
collection PubMed
description Risk prediction models are used to predict a clinical outcome for patients using a set of predictors. We focus on predicting low‐dimensional binary outcomes typically arising in epidemiology, health services and public health research where logistic regression is commonly used. When the number of events is small compared with the number of regression coefficients, model overfitting can be a serious problem. An overfitted model tends to demonstrate poor predictive accuracy when applied to new data. We review frequentist and Bayesian shrinkage methods that may alleviate overfitting by shrinking the regression coefficients towards zero (some methods can also provide more parsimonious models by omitting some predictors). We evaluated their predictive performance in comparison with maximum likelihood estimation using real and simulated data. The simulation study showed that maximum likelihood estimation tends to produce overfitted models with poor predictive performance in scenarios with few events, and penalised methods can offer improvement. Ridge regression performed well, except in scenarios with many noise predictors. Lasso performed better than ridge in scenarios with many noise predictors and worse in the presence of correlated predictors. Elastic net, a hybrid of the two, performed well in all scenarios. Adaptive lasso and smoothly clipped absolute deviation performed best in scenarios with many noise predictors; in other scenarios, their performance was inferior to that of ridge and lasso. Bayesian approaches performed well when the hyperparameters for the priors were chosen carefully. Their use may aid variable selection, and they can be easily extended to clustered‐data settings and to incorporate external information. © 2015 The Authors. Statistics in Medicine Published by JohnWiley & Sons Ltd.
format Online
Article
Text
id pubmed-4982098
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-49820982016-08-26 Review and evaluation of penalised regression methods for risk prediction in low‐dimensional data with few events Pavlou, Menelaos Ambler, Gareth Seaman, Shaun De Iorio, Maria Omar, Rumana Z Stat Med Special Issue Papers Risk prediction models are used to predict a clinical outcome for patients using a set of predictors. We focus on predicting low‐dimensional binary outcomes typically arising in epidemiology, health services and public health research where logistic regression is commonly used. When the number of events is small compared with the number of regression coefficients, model overfitting can be a serious problem. An overfitted model tends to demonstrate poor predictive accuracy when applied to new data. We review frequentist and Bayesian shrinkage methods that may alleviate overfitting by shrinking the regression coefficients towards zero (some methods can also provide more parsimonious models by omitting some predictors). We evaluated their predictive performance in comparison with maximum likelihood estimation using real and simulated data. The simulation study showed that maximum likelihood estimation tends to produce overfitted models with poor predictive performance in scenarios with few events, and penalised methods can offer improvement. Ridge regression performed well, except in scenarios with many noise predictors. Lasso performed better than ridge in scenarios with many noise predictors and worse in the presence of correlated predictors. Elastic net, a hybrid of the two, performed well in all scenarios. Adaptive lasso and smoothly clipped absolute deviation performed best in scenarios with many noise predictors; in other scenarios, their performance was inferior to that of ridge and lasso. Bayesian approaches performed well when the hyperparameters for the priors were chosen carefully. Their use may aid variable selection, and they can be easily extended to clustered‐data settings and to incorporate external information. © 2015 The Authors. Statistics in Medicine Published by JohnWiley & Sons Ltd. John Wiley and Sons Inc. 2015-10-29 2016-03-30 /pmc/articles/PMC4982098/ /pubmed/26514699 http://dx.doi.org/10.1002/sim.6782 Text en © 2015 The Authors. Statistics in Medicine Published by JohnWiley & Sons Ltd. This is an open access article under the terms of the Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Special Issue Papers
Pavlou, Menelaos
Ambler, Gareth
Seaman, Shaun
De Iorio, Maria
Omar, Rumana Z
Review and evaluation of penalised regression methods for risk prediction in low‐dimensional data with few events
title Review and evaluation of penalised regression methods for risk prediction in low‐dimensional data with few events
title_full Review and evaluation of penalised regression methods for risk prediction in low‐dimensional data with few events
title_fullStr Review and evaluation of penalised regression methods for risk prediction in low‐dimensional data with few events
title_full_unstemmed Review and evaluation of penalised regression methods for risk prediction in low‐dimensional data with few events
title_short Review and evaluation of penalised regression methods for risk prediction in low‐dimensional data with few events
title_sort review and evaluation of penalised regression methods for risk prediction in low‐dimensional data with few events
topic Special Issue Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4982098/
https://www.ncbi.nlm.nih.gov/pubmed/26514699
http://dx.doi.org/10.1002/sim.6782
work_keys_str_mv AT pavloumenelaos reviewandevaluationofpenalisedregressionmethodsforriskpredictioninlowdimensionaldatawithfewevents
AT amblergareth reviewandevaluationofpenalisedregressionmethodsforriskpredictioninlowdimensionaldatawithfewevents
AT seamanshaun reviewandevaluationofpenalisedregressionmethodsforriskpredictioninlowdimensionaldatawithfewevents
AT deioriomaria reviewandevaluationofpenalisedregressionmethodsforriskpredictioninlowdimensionaldatawithfewevents
AT omarrumanaz reviewandevaluationofpenalisedregressionmethodsforriskpredictioninlowdimensionaldatawithfewevents