Cargando…

Bring More Data!—A Good Advice? Removing Separation in Logistic Regression by Increasing Sample Size

The parameters of logistic regression models are usually obtained by the method of maximum likelihood (ML). However, in analyses of small data sets or data sets with unbalanced outcomes or exposures, ML parameter estimates may not exist. This situation has been termed ‘separation’ as the two outcome...

Descripción completa

Detalles Bibliográficos
Autores principales: Šinkovec, Hana, Geroldinger, Angelika, Heinze, Georg
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6926877/
https://www.ncbi.nlm.nih.gov/pubmed/31766753
http://dx.doi.org/10.3390/ijerph16234658
_version_ 1783482195478839296
author Šinkovec, Hana
Geroldinger, Angelika
Heinze, Georg
author_facet Šinkovec, Hana
Geroldinger, Angelika
Heinze, Georg
author_sort Šinkovec, Hana
collection PubMed
description The parameters of logistic regression models are usually obtained by the method of maximum likelihood (ML). However, in analyses of small data sets or data sets with unbalanced outcomes or exposures, ML parameter estimates may not exist. This situation has been termed ‘separation’ as the two outcome groups are separated by the values of a covariate or a linear combination of covariates. To overcome the problem of non-existing ML parameter estimates, applying Firth’s correction (FC) was proposed. In practice, however, a principal investigator might be advised to ‘bring more data’ in order to solve a separation issue. We illustrate the problem by means of examples from colorectal cancer screening and ornithology. It is unclear if such an increasing sample size (ISS) strategy that keeps sampling new observations until separation is removed improves estimation compared to applying FC to the original data set. We performed an extensive simulation study where the main focus was to estimate the cost-adjusted relative efficiency of ML combined with ISS compared to FC. FC yielded reasonably small root mean squared errors and proved to be the more efficient estimator. Given our findings, we propose not to adapt the sample size when separation is encountered but to use FC as the default method of analysis whenever the number of observations or outcome events is critically low.
format Online
Article
Text
id pubmed-6926877
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-69268772019-12-23 Bring More Data!—A Good Advice? Removing Separation in Logistic Regression by Increasing Sample Size Šinkovec, Hana Geroldinger, Angelika Heinze, Georg Int J Environ Res Public Health Article The parameters of logistic regression models are usually obtained by the method of maximum likelihood (ML). However, in analyses of small data sets or data sets with unbalanced outcomes or exposures, ML parameter estimates may not exist. This situation has been termed ‘separation’ as the two outcome groups are separated by the values of a covariate or a linear combination of covariates. To overcome the problem of non-existing ML parameter estimates, applying Firth’s correction (FC) was proposed. In practice, however, a principal investigator might be advised to ‘bring more data’ in order to solve a separation issue. We illustrate the problem by means of examples from colorectal cancer screening and ornithology. It is unclear if such an increasing sample size (ISS) strategy that keeps sampling new observations until separation is removed improves estimation compared to applying FC to the original data set. We performed an extensive simulation study where the main focus was to estimate the cost-adjusted relative efficiency of ML combined with ISS compared to FC. FC yielded reasonably small root mean squared errors and proved to be the more efficient estimator. Given our findings, we propose not to adapt the sample size when separation is encountered but to use FC as the default method of analysis whenever the number of observations or outcome events is critically low. MDPI 2019-11-22 2019-12 /pmc/articles/PMC6926877/ /pubmed/31766753 http://dx.doi.org/10.3390/ijerph16234658 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Šinkovec, Hana
Geroldinger, Angelika
Heinze, Georg
Bring More Data!—A Good Advice? Removing Separation in Logistic Regression by Increasing Sample Size
title Bring More Data!—A Good Advice? Removing Separation in Logistic Regression by Increasing Sample Size
title_full Bring More Data!—A Good Advice? Removing Separation in Logistic Regression by Increasing Sample Size
title_fullStr Bring More Data!—A Good Advice? Removing Separation in Logistic Regression by Increasing Sample Size
title_full_unstemmed Bring More Data!—A Good Advice? Removing Separation in Logistic Regression by Increasing Sample Size
title_short Bring More Data!—A Good Advice? Removing Separation in Logistic Regression by Increasing Sample Size
title_sort bring more data!—a good advice? removing separation in logistic regression by increasing sample size
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6926877/
https://www.ncbi.nlm.nih.gov/pubmed/31766753
http://dx.doi.org/10.3390/ijerph16234658
work_keys_str_mv AT sinkovechana bringmoredataagoodadviceremovingseparationinlogisticregressionbyincreasingsamplesize
AT geroldingerangelika bringmoredataagoodadviceremovingseparationinlogisticregressionbyincreasingsamplesize
AT heinzegeorg bringmoredataagoodadviceremovingseparationinlogisticregressionbyincreasingsamplesize