Cargando…

Purposeful selection of variables in logistic regression

BACKGROUND: The main problem in many model-building situations is to choose from a large set of covariates those that should be included in the "best" model. A decision to keep a variable in the model might be based on the clinical or statistical significance. There are several variable se...

Descripción completa

Detalles Bibliográficos
Autores principales: Bursac, Zoran, Gauss, C Heath, Williams, David Keith, Hosmer, David W
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2633005/
https://www.ncbi.nlm.nih.gov/pubmed/19087314
http://dx.doi.org/10.1186/1751-0473-3-17
_version_ 1782164066983215104
author Bursac, Zoran
Gauss, C Heath
Williams, David Keith
Hosmer, David W
author_facet Bursac, Zoran
Gauss, C Heath
Williams, David Keith
Hosmer, David W
author_sort Bursac, Zoran
collection PubMed
description BACKGROUND: The main problem in many model-building situations is to choose from a large set of covariates those that should be included in the "best" model. A decision to keep a variable in the model might be based on the clinical or statistical significance. There are several variable selection algorithms in existence. Those methods are mechanical and as such carry some limitations. Hosmer and Lemeshow describe a purposeful selection of covariates within which an analyst makes a variable selection decision at each step of the modeling process. METHODS: In this paper we introduce an algorithm which automates that process. We conduct a simulation study to compare the performance of this algorithm with three well documented variable selection procedures in SAS PROC LOGISTIC: FORWARD, BACKWARD, and STEPWISE. RESULTS: We show that the advantage of this approach is when the analyst is interested in risk factor modeling and not just prediction. In addition to significant covariates, this variable selection procedure has the capability of retaining important confounding variables, resulting potentially in a slightly richer model. Application of the macro is further illustrated with the Hosmer and Lemeshow Worchester Heart Attack Study (WHAS) data. CONCLUSION: If an analyst is in need of an algorithm that will help guide the retention of significant covariates as well as confounding ones they should consider this macro as an alternative tool.
format Text
id pubmed-2633005
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26330052009-01-30 Purposeful selection of variables in logistic regression Bursac, Zoran Gauss, C Heath Williams, David Keith Hosmer, David W Source Code Biol Med Research BACKGROUND: The main problem in many model-building situations is to choose from a large set of covariates those that should be included in the "best" model. A decision to keep a variable in the model might be based on the clinical or statistical significance. There are several variable selection algorithms in existence. Those methods are mechanical and as such carry some limitations. Hosmer and Lemeshow describe a purposeful selection of covariates within which an analyst makes a variable selection decision at each step of the modeling process. METHODS: In this paper we introduce an algorithm which automates that process. We conduct a simulation study to compare the performance of this algorithm with three well documented variable selection procedures in SAS PROC LOGISTIC: FORWARD, BACKWARD, and STEPWISE. RESULTS: We show that the advantage of this approach is when the analyst is interested in risk factor modeling and not just prediction. In addition to significant covariates, this variable selection procedure has the capability of retaining important confounding variables, resulting potentially in a slightly richer model. Application of the macro is further illustrated with the Hosmer and Lemeshow Worchester Heart Attack Study (WHAS) data. CONCLUSION: If an analyst is in need of an algorithm that will help guide the retention of significant covariates as well as confounding ones they should consider this macro as an alternative tool. BioMed Central 2008-12-16 /pmc/articles/PMC2633005/ /pubmed/19087314 http://dx.doi.org/10.1186/1751-0473-3-17 Text en Copyright © 2008 Bursac et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Bursac, Zoran
Gauss, C Heath
Williams, David Keith
Hosmer, David W
Purposeful selection of variables in logistic regression
title Purposeful selection of variables in logistic regression
title_full Purposeful selection of variables in logistic regression
title_fullStr Purposeful selection of variables in logistic regression
title_full_unstemmed Purposeful selection of variables in logistic regression
title_short Purposeful selection of variables in logistic regression
title_sort purposeful selection of variables in logistic regression
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2633005/
https://www.ncbi.nlm.nih.gov/pubmed/19087314
http://dx.doi.org/10.1186/1751-0473-3-17
work_keys_str_mv AT bursaczoran purposefulselectionofvariablesinlogisticregression
AT gausscheath purposefulselectionofvariablesinlogisticregression
AT williamsdavidkeith purposefulselectionofvariablesinlogisticregression
AT hosmerdavidw purposefulselectionofvariablesinlogisticregression