Cargando…

Classification and regression trees for epidemiologic research: an air pollution example

BACKGROUND: Identifying and characterizing how mixtures of exposures are associated with health endpoints is challenging. We demonstrate how classification and regression trees can be used to generate hypotheses regarding joint effects from exposure mixtures. METHODS: We illustrate the approach by i...

Descripción completa

Detalles Bibliográficos
Autores principales: Gass, Katherine, Klein, Mitch, Chang, Howard H, Flanders, W Dana, Strickland, Matthew J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3977944/
https://www.ncbi.nlm.nih.gov/pubmed/24625053
http://dx.doi.org/10.1186/1476-069X-13-17
_version_ 1782310482433015808
author Gass, Katherine
Klein, Mitch
Chang, Howard H
Flanders, W Dana
Strickland, Matthew J
author_facet Gass, Katherine
Klein, Mitch
Chang, Howard H
Flanders, W Dana
Strickland, Matthew J
author_sort Gass, Katherine
collection PubMed
description BACKGROUND: Identifying and characterizing how mixtures of exposures are associated with health endpoints is challenging. We demonstrate how classification and regression trees can be used to generate hypotheses regarding joint effects from exposure mixtures. METHODS: We illustrate the approach by investigating the joint effects of CO, NO2, O3, and PM2.5 on emergency department visits for pediatric asthma in Atlanta, Georgia. Pollutant concentrations were categorized as quartiles. Days when all pollutants were in the lowest quartile were held out as the referent group (n = 131) and the remaining 3,879 days were used to estimate the regression tree. Pollutants were parameterized as dichotomous variables representing each ordinal split of the quartiles (e.g. comparing CO quartile 1 vs. CO quartiles 2–4) and considered one at a time in a Poisson case-crossover model with control for confounding. The pollutant-split resulting in the smallest P-value was selected as the first split and the dataset was partitioned accordingly. This process repeated for each subset of the data until the P-values for the remaining splits were not below a given alpha, resulting in the formation of a “terminal node”. We used the case-crossover model to estimate the adjusted risk ratio for each terminal node compared to the referent group, as well as the likelihood ratio test for the inclusion of the terminal nodes in the final model. RESULTS: The largest risk ratio corresponded to days when PM2.5 was in the highest quartile and NO2 was in the lowest two quartiles (RR: 1.10, 95% CI: 1.05, 1.16). A simultaneous Wald test for the inclusion of all terminal nodes in the model was significant, with a chi-square statistic of 34.3 (p = 0.001, with 13 degrees of freedom). CONCLUSIONS: Regression trees can be used to hypothesize about joint effects of exposure mixtures and may be particularly useful in the field of air pollution epidemiology for gaining a better understanding of complex multipollutant exposures.
format Online
Article
Text
id pubmed-3977944
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-39779442014-04-08 Classification and regression trees for epidemiologic research: an air pollution example Gass, Katherine Klein, Mitch Chang, Howard H Flanders, W Dana Strickland, Matthew J Environ Health Methodology BACKGROUND: Identifying and characterizing how mixtures of exposures are associated with health endpoints is challenging. We demonstrate how classification and regression trees can be used to generate hypotheses regarding joint effects from exposure mixtures. METHODS: We illustrate the approach by investigating the joint effects of CO, NO2, O3, and PM2.5 on emergency department visits for pediatric asthma in Atlanta, Georgia. Pollutant concentrations were categorized as quartiles. Days when all pollutants were in the lowest quartile were held out as the referent group (n = 131) and the remaining 3,879 days were used to estimate the regression tree. Pollutants were parameterized as dichotomous variables representing each ordinal split of the quartiles (e.g. comparing CO quartile 1 vs. CO quartiles 2–4) and considered one at a time in a Poisson case-crossover model with control for confounding. The pollutant-split resulting in the smallest P-value was selected as the first split and the dataset was partitioned accordingly. This process repeated for each subset of the data until the P-values for the remaining splits were not below a given alpha, resulting in the formation of a “terminal node”. We used the case-crossover model to estimate the adjusted risk ratio for each terminal node compared to the referent group, as well as the likelihood ratio test for the inclusion of the terminal nodes in the final model. RESULTS: The largest risk ratio corresponded to days when PM2.5 was in the highest quartile and NO2 was in the lowest two quartiles (RR: 1.10, 95% CI: 1.05, 1.16). A simultaneous Wald test for the inclusion of all terminal nodes in the model was significant, with a chi-square statistic of 34.3 (p = 0.001, with 13 degrees of freedom). CONCLUSIONS: Regression trees can be used to hypothesize about joint effects of exposure mixtures and may be particularly useful in the field of air pollution epidemiology for gaining a better understanding of complex multipollutant exposures. BioMed Central 2014-03-13 /pmc/articles/PMC3977944/ /pubmed/24625053 http://dx.doi.org/10.1186/1476-069X-13-17 Text en Copyright © 2014 Gass et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology
Gass, Katherine
Klein, Mitch
Chang, Howard H
Flanders, W Dana
Strickland, Matthew J
Classification and regression trees for epidemiologic research: an air pollution example
title Classification and regression trees for epidemiologic research: an air pollution example
title_full Classification and regression trees for epidemiologic research: an air pollution example
title_fullStr Classification and regression trees for epidemiologic research: an air pollution example
title_full_unstemmed Classification and regression trees for epidemiologic research: an air pollution example
title_short Classification and regression trees for epidemiologic research: an air pollution example
title_sort classification and regression trees for epidemiologic research: an air pollution example
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3977944/
https://www.ncbi.nlm.nih.gov/pubmed/24625053
http://dx.doi.org/10.1186/1476-069X-13-17
work_keys_str_mv AT gasskatherine classificationandregressiontreesforepidemiologicresearchanairpollutionexample
AT kleinmitch classificationandregressiontreesforepidemiologicresearchanairpollutionexample
AT changhowardh classificationandregressiontreesforepidemiologicresearchanairpollutionexample
AT flanderswdana classificationandregressiontreesforepidemiologicresearchanairpollutionexample
AT stricklandmatthewj classificationandregressiontreesforepidemiologicresearchanairpollutionexample