Cargando…

Correction of the significance level when attempting multiple transformations of an explanatory variable in generalized linear models

BACKGROUND: In statistical modeling, finding the most favorable coding for an exploratory quantitative variable involves many tests. This process involves multiple testing problems and requires the correction of the significance level. METHODS: For each coding, a test on the nullity of the coefficie...

Descripción completa

Detalles Bibliográficos
Autores principales: Liquet, Benoit, Riou, Jérémie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3699399/
https://www.ncbi.nlm.nih.gov/pubmed/23758852
http://dx.doi.org/10.1186/1471-2288-13-75
Descripción
Sumario:BACKGROUND: In statistical modeling, finding the most favorable coding for an exploratory quantitative variable involves many tests. This process involves multiple testing problems and requires the correction of the significance level. METHODS: For each coding, a test on the nullity of the coefficient associated with the new coded variable is computed. The selected coding corresponds to that associated with the largest statistical test (or equivalently the smallest p(value)). In the context of the Generalized Linear Model, Liquet and Commenges (Stat Probability Lett,71:33–38,2005) proposed an asymptotic correction of the significance level. This procedure, based on the score test, has been developed for dichotomous and Box-Cox transformations. In this paper, we suggest the use of resampling methods to estimate the significance level for categorical transformations with more than two levels and, by definition those that involve more than one parameter in the model. The categorical transformation is a more flexible way to explore the unknown shape of the effect between an explanatory and a dependent variable. RESULTS: The simulations we ran in this study showed good performances of the proposed methods. These methods were illustrated using the data from a study of the relationship between cholesterol and dementia. CONCLUSION: The algorithms were implemented using R, and the associated CPMCGLM R package is available on the CRAN.