Cargando…

Spurious interaction as a result of categorization

BACKGROUND: It is common in applied epidemiological and clinical research to convert continuous variables into categorical variables by grouping values into categories. Such categorized variables are then often used as exposure variables in some regression model. There are numerous statistical argum...

Descripción completa

Detalles Bibliográficos
Autor principal: Thoresen, Magne
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6367751/
https://www.ncbi.nlm.nih.gov/pubmed/30732587
http://dx.doi.org/10.1186/s12874-019-0667-2
_version_ 1783393863470153728
author Thoresen, Magne
author_facet Thoresen, Magne
author_sort Thoresen, Magne
collection PubMed
description BACKGROUND: It is common in applied epidemiological and clinical research to convert continuous variables into categorical variables by grouping values into categories. Such categorized variables are then often used as exposure variables in some regression model. There are numerous statistical arguments why this practice should be avoided, and in this paper we present yet another such argument. METHODS: We show that categorization may lead to spurious interaction in multiple regression models. We give precise analytical expressions for when this may happen in the linear regression model with normally distributed exposure variables, and we show by simulations that the analytical results are valid also for other distributions. Further, we give an interpretation of the results in terms of a measurement error problem. RESULTS: We show that, in the case of a linear model with two normally distributed exposure variables, both categorized at the same cut point, a spurious interaction will be induced unless the two variables are categorized at the median or they are uncorrelated. In simulations with exposure variables following other distributions, we confirm this general effect of categorization, but we also show that the effect of the choice of cut point varies over different distributions. CONCLUSION: Categorization of continuous exposure variables leads to a number of problems, among them spurious interaction effects. Hence, this practice should be avoided and other methods should be considered.
format Online
Article
Text
id pubmed-6367751
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63677512019-02-15 Spurious interaction as a result of categorization Thoresen, Magne BMC Med Res Methodol Research Article BACKGROUND: It is common in applied epidemiological and clinical research to convert continuous variables into categorical variables by grouping values into categories. Such categorized variables are then often used as exposure variables in some regression model. There are numerous statistical arguments why this practice should be avoided, and in this paper we present yet another such argument. METHODS: We show that categorization may lead to spurious interaction in multiple regression models. We give precise analytical expressions for when this may happen in the linear regression model with normally distributed exposure variables, and we show by simulations that the analytical results are valid also for other distributions. Further, we give an interpretation of the results in terms of a measurement error problem. RESULTS: We show that, in the case of a linear model with two normally distributed exposure variables, both categorized at the same cut point, a spurious interaction will be induced unless the two variables are categorized at the median or they are uncorrelated. In simulations with exposure variables following other distributions, we confirm this general effect of categorization, but we also show that the effect of the choice of cut point varies over different distributions. CONCLUSION: Categorization of continuous exposure variables leads to a number of problems, among them spurious interaction effects. Hence, this practice should be avoided and other methods should be considered. BioMed Central 2019-02-07 /pmc/articles/PMC6367751/ /pubmed/30732587 http://dx.doi.org/10.1186/s12874-019-0667-2 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Thoresen, Magne
Spurious interaction as a result of categorization
title Spurious interaction as a result of categorization
title_full Spurious interaction as a result of categorization
title_fullStr Spurious interaction as a result of categorization
title_full_unstemmed Spurious interaction as a result of categorization
title_short Spurious interaction as a result of categorization
title_sort spurious interaction as a result of categorization
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6367751/
https://www.ncbi.nlm.nih.gov/pubmed/30732587
http://dx.doi.org/10.1186/s12874-019-0667-2
work_keys_str_mv AT thoresenmagne spuriousinteractionasaresultofcategorization