Cargando…

A compelling demonstration of why traditional statistical regression models cannot be used to identify risk factors from case data on infectious diseases: a simulation study

BACKGROUND: Regression models are often used to explain the relative risk of infectious diseases among groups. For example, overrepresentation of immigrants among COVID-19 cases has been found in multiple countries. Several studies apply regression models to investigate whether different risk factor...

Descripción completa

Detalles Bibliográficos
Autores principales: Engebretsen, Solveig, Rø, Gunnar, de Blasio, Birgitte Freiesleben
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9123765/
https://www.ncbi.nlm.nih.gov/pubmed/35596137
http://dx.doi.org/10.1186/s12874-022-01565-1
_version_ 1784711620884168704
author Engebretsen, Solveig
Rø, Gunnar
de Blasio, Birgitte Freiesleben
author_facet Engebretsen, Solveig
Rø, Gunnar
de Blasio, Birgitte Freiesleben
author_sort Engebretsen, Solveig
collection PubMed
description BACKGROUND: Regression models are often used to explain the relative risk of infectious diseases among groups. For example, overrepresentation of immigrants among COVID-19 cases has been found in multiple countries. Several studies apply regression models to investigate whether different risk factors can explain this overrepresentation among immigrants without considering dependence between the cases. METHODS: We study the appropriateness of traditional statistical regression methods for identifying risk factors for infectious diseases, by a simulation study. We model infectious disease spread by a simple, population-structured version of an SIR (susceptible-infected-recovered)-model, which is one of the most famous and well-established models for infectious disease spread. The population is thus divided into different sub-groups. We vary the contact structure between the sub-groups of the population. We analyse the relation between individual-level risk of infection and group-level relative risk. We analyse whether Poisson regression estimators can capture the true, underlying parameters of transmission. We assess both the quantitative and qualitative accuracy of the estimated regression coefficients. RESULTS: We illustrate that there is no clear relationship between differences in individual characteristics and group-level overrepresentation —small differences on the individual level can result in arbitrarily high overrepresentation. We demonstrate that individual risk of infection cannot be properly defined without simultaneous specification of the infection level of the population. We argue that the estimated regression coefficients are not interpretable and show that it is not possible to adjust for other variables by standard regression methods. Finally, we illustrate that regression models can result in the significance of variables unrelated to infection risk in the constructed simulation example (e.g. ethnicity), particularly when a large proportion of contacts is within the same group. CONCLUSIONS: Traditional regression models which are valid for modelling risk between groups for non-communicable diseases are not valid for infectious diseases. By applying such methods to identify risk factors of infectious diseases, one risks ending up with wrong conclusions. Output from such analyses should therefore be treated with great caution. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-022-01565-1.
format Online
Article
Text
id pubmed-9123765
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-91237652022-05-22 A compelling demonstration of why traditional statistical regression models cannot be used to identify risk factors from case data on infectious diseases: a simulation study Engebretsen, Solveig Rø, Gunnar de Blasio, Birgitte Freiesleben BMC Med Res Methodol Research BACKGROUND: Regression models are often used to explain the relative risk of infectious diseases among groups. For example, overrepresentation of immigrants among COVID-19 cases has been found in multiple countries. Several studies apply regression models to investigate whether different risk factors can explain this overrepresentation among immigrants without considering dependence between the cases. METHODS: We study the appropriateness of traditional statistical regression methods for identifying risk factors for infectious diseases, by a simulation study. We model infectious disease spread by a simple, population-structured version of an SIR (susceptible-infected-recovered)-model, which is one of the most famous and well-established models for infectious disease spread. The population is thus divided into different sub-groups. We vary the contact structure between the sub-groups of the population. We analyse the relation between individual-level risk of infection and group-level relative risk. We analyse whether Poisson regression estimators can capture the true, underlying parameters of transmission. We assess both the quantitative and qualitative accuracy of the estimated regression coefficients. RESULTS: We illustrate that there is no clear relationship between differences in individual characteristics and group-level overrepresentation —small differences on the individual level can result in arbitrarily high overrepresentation. We demonstrate that individual risk of infection cannot be properly defined without simultaneous specification of the infection level of the population. We argue that the estimated regression coefficients are not interpretable and show that it is not possible to adjust for other variables by standard regression methods. Finally, we illustrate that regression models can result in the significance of variables unrelated to infection risk in the constructed simulation example (e.g. ethnicity), particularly when a large proportion of contacts is within the same group. CONCLUSIONS: Traditional regression models which are valid for modelling risk between groups for non-communicable diseases are not valid for infectious diseases. By applying such methods to identify risk factors of infectious diseases, one risks ending up with wrong conclusions. Output from such analyses should therefore be treated with great caution. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-022-01565-1. BioMed Central 2022-05-20 /pmc/articles/PMC9123765/ /pubmed/35596137 http://dx.doi.org/10.1186/s12874-022-01565-1 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Engebretsen, Solveig
Rø, Gunnar
de Blasio, Birgitte Freiesleben
A compelling demonstration of why traditional statistical regression models cannot be used to identify risk factors from case data on infectious diseases: a simulation study
title A compelling demonstration of why traditional statistical regression models cannot be used to identify risk factors from case data on infectious diseases: a simulation study
title_full A compelling demonstration of why traditional statistical regression models cannot be used to identify risk factors from case data on infectious diseases: a simulation study
title_fullStr A compelling demonstration of why traditional statistical regression models cannot be used to identify risk factors from case data on infectious diseases: a simulation study
title_full_unstemmed A compelling demonstration of why traditional statistical regression models cannot be used to identify risk factors from case data on infectious diseases: a simulation study
title_short A compelling demonstration of why traditional statistical regression models cannot be used to identify risk factors from case data on infectious diseases: a simulation study
title_sort compelling demonstration of why traditional statistical regression models cannot be used to identify risk factors from case data on infectious diseases: a simulation study
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9123765/
https://www.ncbi.nlm.nih.gov/pubmed/35596137
http://dx.doi.org/10.1186/s12874-022-01565-1
work_keys_str_mv AT engebretsensolveig acompellingdemonstrationofwhytraditionalstatisticalregressionmodelscannotbeusedtoidentifyriskfactorsfromcasedataoninfectiousdiseasesasimulationstudy
AT røgunnar acompellingdemonstrationofwhytraditionalstatisticalregressionmodelscannotbeusedtoidentifyriskfactorsfromcasedataoninfectiousdiseasesasimulationstudy
AT deblasiobirgittefreiesleben acompellingdemonstrationofwhytraditionalstatisticalregressionmodelscannotbeusedtoidentifyriskfactorsfromcasedataoninfectiousdiseasesasimulationstudy
AT engebretsensolveig compellingdemonstrationofwhytraditionalstatisticalregressionmodelscannotbeusedtoidentifyriskfactorsfromcasedataoninfectiousdiseasesasimulationstudy
AT røgunnar compellingdemonstrationofwhytraditionalstatisticalregressionmodelscannotbeusedtoidentifyriskfactorsfromcasedataoninfectiousdiseasesasimulationstudy
AT deblasiobirgittefreiesleben compellingdemonstrationofwhytraditionalstatisticalregressionmodelscannotbeusedtoidentifyriskfactorsfromcasedataoninfectiousdiseasesasimulationstudy