Cargando…

Performance of Firth-and logF-type penalized methods in risk prediction for small or sparse binary data

BACKGROUND: When developing risk models for binary data with small or sparse data sets, the standard maximum likelihood estimation (MLE) based logistic regression faces several problems including biased or infinite estimate of the regression coefficient and frequent convergence failure of the likeli...

Descripción completa

Detalles Bibliográficos
Autores principales:	Rahman, M. Shafiqur, Sultana, Mahbuba
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2017
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5324225/ https://www.ncbi.nlm.nih.gov/pubmed/28231767 http://dx.doi.org/10.1186/s12874-017-0313-9

_version_	1782510182107971584
author	Rahman, M. Shafiqur Sultana, Mahbuba
author_facet	Rahman, M. Shafiqur Sultana, Mahbuba
author_sort	Rahman, M. Shafiqur
collection	PubMed
description	BACKGROUND: When developing risk models for binary data with small or sparse data sets, the standard maximum likelihood estimation (MLE) based logistic regression faces several problems including biased or infinite estimate of the regression coefficient and frequent convergence failure of the likelihood due to separation. The problem of separation occurs commonly even if sample size is large but there is sufficient number of strong predictors. In the presence of separation, even if one develops the model, it produces overfitted model with poor predictive performance. Firth-and logF-type penalized regression methods are popular alternative to MLE, particularly for solving separation-problem. Despite the attractive advantages, their use in risk prediction is very limited. This paper evaluated these methods in risk prediction in comparison with MLE and other commonly used penalized methods such as ridge. METHODS: The predictive performance of the methods was evaluated through assessing calibration, discrimination and overall predictive performance using an extensive simulation study. Further an illustration of the methods were provided using a real data example with low prevalence of outcome. RESULTS: The MLE showed poor performance in risk prediction in small or sparse data sets. All penalized methods offered some improvements in calibration, discrimination and overall predictive performance. Although the Firth-and logF-type methods showed almost equal amount of improvement, Firth-type penalization produces some bias in the average predicted probability, and the amount of bias is even larger than that produced by MLE. Of the logF(1,1) and logF(2,2) penalization, logF(2,2) provides slight bias in the estimate of regression coefficient of binary predictor and logF(1,1) performed better in all aspects. Similarly, ridge performed well in discrimination and overall predictive performance but it often produces underfitted model and has high rate of convergence failure (even the rate is higher than that for MLE), probably due to the separation problem. CONCLUSIONS: The logF-type penalized method, particularly logF(1,1) could be used in practice when developing risk model for small or sparse data sets.
format	Online Article Text
id	pubmed-5324225
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-53242252017-03-01 Performance of Firth-and logF-type penalized methods in risk prediction for small or sparse binary data Rahman, M. Shafiqur Sultana, Mahbuba BMC Med Res Methodol Research Article BACKGROUND: When developing risk models for binary data with small or sparse data sets, the standard maximum likelihood estimation (MLE) based logistic regression faces several problems including biased or infinite estimate of the regression coefficient and frequent convergence failure of the likelihood due to separation. The problem of separation occurs commonly even if sample size is large but there is sufficient number of strong predictors. In the presence of separation, even if one develops the model, it produces overfitted model with poor predictive performance. Firth-and logF-type penalized regression methods are popular alternative to MLE, particularly for solving separation-problem. Despite the attractive advantages, their use in risk prediction is very limited. This paper evaluated these methods in risk prediction in comparison with MLE and other commonly used penalized methods such as ridge. METHODS: The predictive performance of the methods was evaluated through assessing calibration, discrimination and overall predictive performance using an extensive simulation study. Further an illustration of the methods were provided using a real data example with low prevalence of outcome. RESULTS: The MLE showed poor performance in risk prediction in small or sparse data sets. All penalized methods offered some improvements in calibration, discrimination and overall predictive performance. Although the Firth-and logF-type methods showed almost equal amount of improvement, Firth-type penalization produces some bias in the average predicted probability, and the amount of bias is even larger than that produced by MLE. Of the logF(1,1) and logF(2,2) penalization, logF(2,2) provides slight bias in the estimate of regression coefficient of binary predictor and logF(1,1) performed better in all aspects. Similarly, ridge performed well in discrimination and overall predictive performance but it often produces underfitted model and has high rate of convergence failure (even the rate is higher than that for MLE), probably due to the separation problem. CONCLUSIONS: The logF-type penalized method, particularly logF(1,1) could be used in practice when developing risk model for small or sparse data sets. BioMed Central 2017-02-23 /pmc/articles/PMC5324225/ /pubmed/28231767 http://dx.doi.org/10.1186/s12874-017-0313-9 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Rahman, M. Shafiqur Sultana, Mahbuba Performance of Firth-and logF-type penalized methods in risk prediction for small or sparse binary data
title	Performance of Firth-and logF-type penalized methods in risk prediction for small or sparse binary data
title_full	Performance of Firth-and logF-type penalized methods in risk prediction for small or sparse binary data
title_fullStr	Performance of Firth-and logF-type penalized methods in risk prediction for small or sparse binary data
title_full_unstemmed	Performance of Firth-and logF-type penalized methods in risk prediction for small or sparse binary data
title_short	Performance of Firth-and logF-type penalized methods in risk prediction for small or sparse binary data
title_sort	performance of firth-and logf-type penalized methods in risk prediction for small or sparse binary data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5324225/ https://www.ncbi.nlm.nih.gov/pubmed/28231767 http://dx.doi.org/10.1186/s12874-017-0313-9
work_keys_str_mv	AT rahmanmshafiqur performanceoffirthandlogftypepenalizedmethodsinriskpredictionforsmallorsparsebinarydata AT sultanamahbuba performanceoffirthandlogftypepenalizedmethodsinriskpredictionforsmallorsparsebinarydata

Performance of Firth-and logF-type penalized methods in risk prediction for small or sparse binary data

Ejemplares similares