Cargando…

Bias in Odds Ratios From Logistic Regression Methods With Sparse Data Sets

BACKGROUND: Logistic regression models are widely used to evaluate the association between a binary outcome and a set of covariates. However, when there are few study participants at the outcome and covariate levels, the models lead to bias of the odds ratio (OR) estimated using the maximum likeliho...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gosho, Masahiko, Ohigashi, Tomohiro, Nagashima, Kengo, Ito, Yuri, Maruo, Kazushi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Japan Epidemiological Association 2023
Materias:	Review Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10165217/ https://www.ncbi.nlm.nih.gov/pubmed/34565762 http://dx.doi.org/10.2188/jea.JE20210089

_version_	1785038223208087552
author	Gosho, Masahiko Ohigashi, Tomohiro Nagashima, Kengo Ito, Yuri Maruo, Kazushi
author_facet	Gosho, Masahiko Ohigashi, Tomohiro Nagashima, Kengo Ito, Yuri Maruo, Kazushi
author_sort	Gosho, Masahiko
collection	PubMed
description	BACKGROUND: Logistic regression models are widely used to evaluate the association between a binary outcome and a set of covariates. However, when there are few study participants at the outcome and covariate levels, the models lead to bias of the odds ratio (OR) estimated using the maximum likelihood (ML) method. This bias is known as sparse data bias, and the estimated OR can yield impossibly large values because of data sparsity. However, this bias has been ignored in most epidemiological studies. METHODS: We review several methods for reducing sparse data bias in logistic regression. The primary aim is to evaluate the Bayesian methods in comparison with the classical methods, such as the ML, Firth’s, and exact methods using a simulation study. We also apply these methods to a real data set. RESULTS: Our simulation results indicate that the bias of the OR from the ML, Firth’s, and exact methods is considerable. Furthermore, the Bayesian methods with hyper-ɡ prior modeling of the prior covariance matrix for regression coefficients reduced the bias under the null hypothesis, whereas the Bayesian methods with log F-type priors reduced the bias under the alternative hypothesis. CONCLUSION: The Bayesian methods using log F-type priors and hyper-ɡ prior are superior to the ML, Firth’s, and exact methods when fitting logistic models to sparse data sets. The choice of a preferable method depends on the null and alternative hypothesis. Sensitivity analysis is important to understand the robustness of the results in sparse data analysis.
format	Online Article Text
id	pubmed-10165217
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Japan Epidemiological Association
record_format	MEDLINE/PubMed
spelling	pubmed-101652172023-06-05 Bias in Odds Ratios From Logistic Regression Methods With Sparse Data Sets Gosho, Masahiko Ohigashi, Tomohiro Nagashima, Kengo Ito, Yuri Maruo, Kazushi J Epidemiol Review Article BACKGROUND: Logistic regression models are widely used to evaluate the association between a binary outcome and a set of covariates. However, when there are few study participants at the outcome and covariate levels, the models lead to bias of the odds ratio (OR) estimated using the maximum likelihood (ML) method. This bias is known as sparse data bias, and the estimated OR can yield impossibly large values because of data sparsity. However, this bias has been ignored in most epidemiological studies. METHODS: We review several methods for reducing sparse data bias in logistic regression. The primary aim is to evaluate the Bayesian methods in comparison with the classical methods, such as the ML, Firth’s, and exact methods using a simulation study. We also apply these methods to a real data set. RESULTS: Our simulation results indicate that the bias of the OR from the ML, Firth’s, and exact methods is considerable. Furthermore, the Bayesian methods with hyper-ɡ prior modeling of the prior covariance matrix for regression coefficients reduced the bias under the null hypothesis, whereas the Bayesian methods with log F-type priors reduced the bias under the alternative hypothesis. CONCLUSION: The Bayesian methods using log F-type priors and hyper-ɡ prior are superior to the ML, Firth’s, and exact methods when fitting logistic models to sparse data sets. The choice of a preferable method depends on the null and alternative hypothesis. Sensitivity analysis is important to understand the robustness of the results in sparse data analysis. Japan Epidemiological Association 2023-06-05 /pmc/articles/PMC10165217/ /pubmed/34565762 http://dx.doi.org/10.2188/jea.JE20210089 Text en © 2022 Masahiko Gosho et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Review Article Gosho, Masahiko Ohigashi, Tomohiro Nagashima, Kengo Ito, Yuri Maruo, Kazushi Bias in Odds Ratios From Logistic Regression Methods With Sparse Data Sets
title	Bias in Odds Ratios From Logistic Regression Methods With Sparse Data Sets
title_full	Bias in Odds Ratios From Logistic Regression Methods With Sparse Data Sets
title_fullStr	Bias in Odds Ratios From Logistic Regression Methods With Sparse Data Sets
title_full_unstemmed	Bias in Odds Ratios From Logistic Regression Methods With Sparse Data Sets
title_short	Bias in Odds Ratios From Logistic Regression Methods With Sparse Data Sets
title_sort	bias in odds ratios from logistic regression methods with sparse data sets
topic	Review Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10165217/ https://www.ncbi.nlm.nih.gov/pubmed/34565762 http://dx.doi.org/10.2188/jea.JE20210089
work_keys_str_mv	AT goshomasahiko biasinoddsratiosfromlogisticregressionmethodswithsparsedatasets AT ohigashitomohiro biasinoddsratiosfromlogisticregressionmethodswithsparsedatasets AT nagashimakengo biasinoddsratiosfromlogisticregressionmethodswithsparsedatasets AT itoyuri biasinoddsratiosfromlogisticregressionmethodswithsparsedatasets AT maruokazushi biasinoddsratiosfromlogisticregressionmethodswithsparsedatasets

Bias in Odds Ratios From Logistic Regression Methods With Sparse Data Sets

Ejemplares similares