Cargando…

A simulation study of confounding in generalized linear models for air pollution epidemiology.

Confounding between the model covariates and causal variables (which may or may not be included as model covariates) is a well-known problem in regression models used in air pollution epidemiology. This problem is usually acknowledged but hardly ever investigated, especially in the context of genera...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, C, Chock, D P, Winkler, S L
Formato: Texto
Lenguaje:English
Publicado: 1999
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1566403/
https://www.ncbi.nlm.nih.gov/pubmed/10064552
_version_ 1782129612711526400
author Chen, C
Chock, D P
Winkler, S L
author_facet Chen, C
Chock, D P
Winkler, S L
author_sort Chen, C
collection PubMed
description Confounding between the model covariates and causal variables (which may or may not be included as model covariates) is a well-known problem in regression models used in air pollution epidemiology. This problem is usually acknowledged but hardly ever investigated, especially in the context of generalized linear models. Using synthetic data sets, the present study shows how model overfit, underfit, and misfit in the presence of correlated causal variables in a Poisson regression model affect the estimated coefficients of the covariates and their confidence levels. The study also shows how this effect changes with the ranges of the covariates and the sample size. There is qualitative agreement between these study results and the corresponding expressions in the large-sample limit for the ordinary linear models. Confounding of covariates in an overfitted model (with covariates encompassing more than just the causal variables) does not bias the estimated coefficients but reduces their significance. The effect of model underfit (with some causal variables excluded as covariates) or misfit (with covariates encompassing only noncausal variables), on the other hand, leads to not only erroneous estimated coefficients, but a misguided confidence, represented by large t-values, that the estimated coefficients are significant. The results of this study indicate that models which use only one or two air quality variables, such as particulate matter [less than and equal to] 10 microm and sulfur dioxide, are probably unreliable, and that models containing several correlated and toxic or potentially toxic air quality variables should also be investigated in order to minimize the situation of model underfit or misfit.
format Text
id pubmed-1566403
institution National Center for Biotechnology Information
language English
publishDate 1999
record_format MEDLINE/PubMed
spelling pubmed-15664032006-09-19 A simulation study of confounding in generalized linear models for air pollution epidemiology. Chen, C Chock, D P Winkler, S L Environ Health Perspect Research Article Confounding between the model covariates and causal variables (which may or may not be included as model covariates) is a well-known problem in regression models used in air pollution epidemiology. This problem is usually acknowledged but hardly ever investigated, especially in the context of generalized linear models. Using synthetic data sets, the present study shows how model overfit, underfit, and misfit in the presence of correlated causal variables in a Poisson regression model affect the estimated coefficients of the covariates and their confidence levels. The study also shows how this effect changes with the ranges of the covariates and the sample size. There is qualitative agreement between these study results and the corresponding expressions in the large-sample limit for the ordinary linear models. Confounding of covariates in an overfitted model (with covariates encompassing more than just the causal variables) does not bias the estimated coefficients but reduces their significance. The effect of model underfit (with some causal variables excluded as covariates) or misfit (with covariates encompassing only noncausal variables), on the other hand, leads to not only erroneous estimated coefficients, but a misguided confidence, represented by large t-values, that the estimated coefficients are significant. The results of this study indicate that models which use only one or two air quality variables, such as particulate matter [less than and equal to] 10 microm and sulfur dioxide, are probably unreliable, and that models containing several correlated and toxic or potentially toxic air quality variables should also be investigated in order to minimize the situation of model underfit or misfit. 1999-03 /pmc/articles/PMC1566403/ /pubmed/10064552 Text en
spellingShingle Research Article
Chen, C
Chock, D P
Winkler, S L
A simulation study of confounding in generalized linear models for air pollution epidemiology.
title A simulation study of confounding in generalized linear models for air pollution epidemiology.
title_full A simulation study of confounding in generalized linear models for air pollution epidemiology.
title_fullStr A simulation study of confounding in generalized linear models for air pollution epidemiology.
title_full_unstemmed A simulation study of confounding in generalized linear models for air pollution epidemiology.
title_short A simulation study of confounding in generalized linear models for air pollution epidemiology.
title_sort simulation study of confounding in generalized linear models for air pollution epidemiology.
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1566403/
https://www.ncbi.nlm.nih.gov/pubmed/10064552
work_keys_str_mv AT chenc asimulationstudyofconfoundingingeneralizedlinearmodelsforairpollutionepidemiology
AT chockdp asimulationstudyofconfoundingingeneralizedlinearmodelsforairpollutionepidemiology
AT winklersl asimulationstudyofconfoundingingeneralizedlinearmodelsforairpollutionepidemiology
AT chenc simulationstudyofconfoundingingeneralizedlinearmodelsforairpollutionepidemiology
AT chockdp simulationstudyofconfoundingingeneralizedlinearmodelsforairpollutionepidemiology
AT winklersl simulationstudyofconfoundingingeneralizedlinearmodelsforairpollutionepidemiology