Cargando…

Variable selection – A review and recommendations for the practicing statistician

Statistical models support medical research by facilitating individualized outcome prognostication conditional on independent variables or by estimating effects of risk factors adjusted for covariates. Theory of statistical models is well‐established if the set of independent variables to consider i...

Descripción completa

Detalles Bibliográficos
Autores principales:	Heinze, Georg, Wallisch, Christine, Dunkler, Daniela
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	John Wiley and Sons Inc. 2018
Materias:	Biometry in Practice
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5969114/ https://www.ncbi.nlm.nih.gov/pubmed/29292533 http://dx.doi.org/10.1002/bimj.201700067

_version_	1783325909209579520
author	Heinze, Georg Wallisch, Christine Dunkler, Daniela
author_facet	Heinze, Georg Wallisch, Christine Dunkler, Daniela
author_sort	Heinze, Georg
collection	PubMed
description	Statistical models support medical research by facilitating individualized outcome prognostication conditional on independent variables or by estimating effects of risk factors adjusted for covariates. Theory of statistical models is well‐established if the set of independent variables to consider is fixed and small. Hence, we can assume that effect estimates are unbiased and the usual methods for confidence interval estimation are valid. In routine work, however, it is not known a priori which covariates should be included in a model, and often we are confronted with the number of candidate variables in the range 10–30. This number is often too large to be considered in a statistical model. We provide an overview of various available variable selection methods that are based on significance or information criteria, penalized likelihood, the change‐in‐estimate criterion, background knowledge, or combinations thereof. These methods were usually developed in the context of a linear regression model and then transferred to more generalized linear models or models for censored survival data. Variable selection, in particular if used in explanatory modeling where effect estimates are of central interest, can compromise stability of a final model, unbiasedness of regression coefficients, and validity of p‐values or confidence intervals. Therefore, we give pragmatic recommendations for the practicing statistician on application of variable selection methods in general (low‐dimensional) modeling problems and on performing stability investigations and inference. We also propose some quantities based on resampling the entire variable selection process to be routinely reported by software packages offering automated variable selection algorithms.
format	Online Article Text
id	pubmed-5969114
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	John Wiley and Sons Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-59691142018-05-30 Variable selection – A review and recommendations for the practicing statistician Heinze, Georg Wallisch, Christine Dunkler, Daniela Biom J Biometry in Practice Statistical models support medical research by facilitating individualized outcome prognostication conditional on independent variables or by estimating effects of risk factors adjusted for covariates. Theory of statistical models is well‐established if the set of independent variables to consider is fixed and small. Hence, we can assume that effect estimates are unbiased and the usual methods for confidence interval estimation are valid. In routine work, however, it is not known a priori which covariates should be included in a model, and often we are confronted with the number of candidate variables in the range 10–30. This number is often too large to be considered in a statistical model. We provide an overview of various available variable selection methods that are based on significance or information criteria, penalized likelihood, the change‐in‐estimate criterion, background knowledge, or combinations thereof. These methods were usually developed in the context of a linear regression model and then transferred to more generalized linear models or models for censored survival data. Variable selection, in particular if used in explanatory modeling where effect estimates are of central interest, can compromise stability of a final model, unbiasedness of regression coefficients, and validity of p‐values or confidence intervals. Therefore, we give pragmatic recommendations for the practicing statistician on application of variable selection methods in general (low‐dimensional) modeling problems and on performing stability investigations and inference. We also propose some quantities based on resampling the entire variable selection process to be routinely reported by software packages offering automated variable selection algorithms. John Wiley and Sons Inc. 2018-01-02 2018-05 /pmc/articles/PMC5969114/ /pubmed/29292533 http://dx.doi.org/10.1002/bimj.201700067 Text en © 2017 The Authors. Biometrical Journal Published by WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc-nd/4.0/ License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.
spellingShingle	Biometry in Practice Heinze, Georg Wallisch, Christine Dunkler, Daniela Variable selection – A review and recommendations for the practicing statistician
title	Variable selection – A review and recommendations for the practicing statistician
title_full	Variable selection – A review and recommendations for the practicing statistician
title_fullStr	Variable selection – A review and recommendations for the practicing statistician
title_full_unstemmed	Variable selection – A review and recommendations for the practicing statistician
title_short	Variable selection – A review and recommendations for the practicing statistician
title_sort	variable selection – a review and recommendations for the practicing statistician
topic	Biometry in Practice
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5969114/ https://www.ncbi.nlm.nih.gov/pubmed/29292533 http://dx.doi.org/10.1002/bimj.201700067
work_keys_str_mv	AT heinzegeorg variableselectionareviewandrecommendationsforthepracticingstatistician AT wallischchristine variableselectionareviewandrecommendationsforthepracticingstatistician AT dunklerdaniela variableselectionareviewandrecommendationsforthepracticingstatistician

Variable selection – A review and recommendations for the practicing statistician

Ejemplares similares