Cargando…

Penalized logistic regression with low prevalence exposures beyond high dimensional settings

Estimating and selecting risk factors with extremely low prevalences of exposure for a binary outcome is a challenge because classical standard techniques, markedly logistic regression, often fail to provide meaningful results in such settings. While penalized regression methods are widely used in h...

Descripción completa

Detalles Bibliográficos
Autores principales: Doerken, Sam, Avalos, Marta, Lagarde, Emmanuel, Schumacher, Martin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6527211/
https://www.ncbi.nlm.nih.gov/pubmed/31107924
http://dx.doi.org/10.1371/journal.pone.0217057
_version_ 1783420006052134912
author Doerken, Sam
Avalos, Marta
Lagarde, Emmanuel
Schumacher, Martin
author_facet Doerken, Sam
Avalos, Marta
Lagarde, Emmanuel
Schumacher, Martin
author_sort Doerken, Sam
collection PubMed
description Estimating and selecting risk factors with extremely low prevalences of exposure for a binary outcome is a challenge because classical standard techniques, markedly logistic regression, often fail to provide meaningful results in such settings. While penalized regression methods are widely used in high-dimensional settings, we were able to show their usefulness in low-dimensional settings as well. Specifically, we demonstrate that Firth correction, ridge, the lasso and boosting all improve the estimation for low-prevalence risk factors. While the methods themselves are well-established, comparison studies are needed to assess their potential benefits in this context. This is done here using the dataset of a large unmatched case-control study from France (2005-2008) about the relationship between prescription medicines and road traffic accidents and an accompanying simulation study. Results show that the estimation of risk factors with prevalences below 0.1% can be drastically improved by using Firth correction and boosting in particular, especially for ultra-low prevalences. When a moderate number of low prevalence exposures is available, we recommend the use of penalized techniques.
format Online
Article
Text
id pubmed-6527211
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-65272112019-05-31 Penalized logistic regression with low prevalence exposures beyond high dimensional settings Doerken, Sam Avalos, Marta Lagarde, Emmanuel Schumacher, Martin PLoS One Research Article Estimating and selecting risk factors with extremely low prevalences of exposure for a binary outcome is a challenge because classical standard techniques, markedly logistic regression, often fail to provide meaningful results in such settings. While penalized regression methods are widely used in high-dimensional settings, we were able to show their usefulness in low-dimensional settings as well. Specifically, we demonstrate that Firth correction, ridge, the lasso and boosting all improve the estimation for low-prevalence risk factors. While the methods themselves are well-established, comparison studies are needed to assess their potential benefits in this context. This is done here using the dataset of a large unmatched case-control study from France (2005-2008) about the relationship between prescription medicines and road traffic accidents and an accompanying simulation study. Results show that the estimation of risk factors with prevalences below 0.1% can be drastically improved by using Firth correction and boosting in particular, especially for ultra-low prevalences. When a moderate number of low prevalence exposures is available, we recommend the use of penalized techniques. Public Library of Science 2019-05-20 /pmc/articles/PMC6527211/ /pubmed/31107924 http://dx.doi.org/10.1371/journal.pone.0217057 Text en © 2019 Doerken et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Doerken, Sam
Avalos, Marta
Lagarde, Emmanuel
Schumacher, Martin
Penalized logistic regression with low prevalence exposures beyond high dimensional settings
title Penalized logistic regression with low prevalence exposures beyond high dimensional settings
title_full Penalized logistic regression with low prevalence exposures beyond high dimensional settings
title_fullStr Penalized logistic regression with low prevalence exposures beyond high dimensional settings
title_full_unstemmed Penalized logistic regression with low prevalence exposures beyond high dimensional settings
title_short Penalized logistic regression with low prevalence exposures beyond high dimensional settings
title_sort penalized logistic regression with low prevalence exposures beyond high dimensional settings
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6527211/
https://www.ncbi.nlm.nih.gov/pubmed/31107924
http://dx.doi.org/10.1371/journal.pone.0217057
work_keys_str_mv AT doerkensam penalizedlogisticregressionwithlowprevalenceexposuresbeyondhighdimensionalsettings
AT avalosmarta penalizedlogisticregressionwithlowprevalenceexposuresbeyondhighdimensionalsettings
AT lagardeemmanuel penalizedlogisticregressionwithlowprevalenceexposuresbeyondhighdimensionalsettings
AT schumachermartin penalizedlogisticregressionwithlowprevalenceexposuresbeyondhighdimensionalsettings