Cargando…

Statistical strategies for constructing health risk models with multiple pollutants and their interactions: possible choices and comparisons

BACKGROUND: As public awareness of consequences of environmental exposures has grown, estimating the adverse health effects due to simultaneous exposure to multiple pollutants is an important topic to explore. The challenges of evaluating the health impacts of environmental factors in a multipolluta...

Descripción completa

Detalles Bibliográficos
Autores principales: Sun, Zhichao, Tao, Yebin, Li, Shi, Ferguson, Kelly K, Meeker, John D, Park, Sung Kyun, Batterman, Stuart A, Mukherjee, Bhramar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3857674/
https://www.ncbi.nlm.nih.gov/pubmed/24093917
http://dx.doi.org/10.1186/1476-069X-12-85
_version_ 1782295185402626048
author Sun, Zhichao
Tao, Yebin
Li, Shi
Ferguson, Kelly K
Meeker, John D
Park, Sung Kyun
Batterman, Stuart A
Mukherjee, Bhramar
author_facet Sun, Zhichao
Tao, Yebin
Li, Shi
Ferguson, Kelly K
Meeker, John D
Park, Sung Kyun
Batterman, Stuart A
Mukherjee, Bhramar
author_sort Sun, Zhichao
collection PubMed
description BACKGROUND: As public awareness of consequences of environmental exposures has grown, estimating the adverse health effects due to simultaneous exposure to multiple pollutants is an important topic to explore. The challenges of evaluating the health impacts of environmental factors in a multipollutant model include, but are not limited to: identification of the most critical components of the pollutant mixture, examination of potential interaction effects, and attribution of health effects to individual pollutants in the presence of multicollinearity. METHODS: In this paper, we reviewed five methods available in the statistical literature that are potentially helpful for constructing multipollutant models. We conducted a simulation study and presented two data examples to assess the performance of these methods on feature selection, effect estimation and interaction identification using both cross-sectional and time-series designs. We also proposed and evaluated a two-step strategy employing an initial screening by a tree-based method followed by further dimension reduction/variable selection by the aforementioned five approaches at the second step. RESULTS: Among the five methods, least absolute shrinkage and selection operator regression performs well in general for identifying important exposures, but will yield biased estimates and slightly larger model dimension given many correlated candidate exposures and modest sample size. Bayesian model averaging, and supervised principal component analysis are also useful in variable selection when there is a moderately strong exposure-response association. Substantial improvements on reducing model dimension and identifying important variables have been observed for all the five statistical methods using the two-step modeling strategy when the number of candidate variables is large. CONCLUSIONS: There is no uniform dominance of one method across all simulation scenarios and all criteria. The performances differ according to the nature of the response variable, the sample size, the number of pollutants involved, and the strength of exposure-response association/interaction. However, the two-step modeling strategy proposed here is potentially applicable under a multipollutant framework with many covariates by taking advantage of both the screening feature of an initial tree-based method and dimension reduction/variable selection property of the subsequent method. The choice of the method should also depend on the goal of the study: risk prediction, effect estimation or screening for important predictors and their interactions.
format Online
Article
Text
id pubmed-3857674
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-38576742013-12-11 Statistical strategies for constructing health risk models with multiple pollutants and their interactions: possible choices and comparisons Sun, Zhichao Tao, Yebin Li, Shi Ferguson, Kelly K Meeker, John D Park, Sung Kyun Batterman, Stuart A Mukherjee, Bhramar Environ Health Research BACKGROUND: As public awareness of consequences of environmental exposures has grown, estimating the adverse health effects due to simultaneous exposure to multiple pollutants is an important topic to explore. The challenges of evaluating the health impacts of environmental factors in a multipollutant model include, but are not limited to: identification of the most critical components of the pollutant mixture, examination of potential interaction effects, and attribution of health effects to individual pollutants in the presence of multicollinearity. METHODS: In this paper, we reviewed five methods available in the statistical literature that are potentially helpful for constructing multipollutant models. We conducted a simulation study and presented two data examples to assess the performance of these methods on feature selection, effect estimation and interaction identification using both cross-sectional and time-series designs. We also proposed and evaluated a two-step strategy employing an initial screening by a tree-based method followed by further dimension reduction/variable selection by the aforementioned five approaches at the second step. RESULTS: Among the five methods, least absolute shrinkage and selection operator regression performs well in general for identifying important exposures, but will yield biased estimates and slightly larger model dimension given many correlated candidate exposures and modest sample size. Bayesian model averaging, and supervised principal component analysis are also useful in variable selection when there is a moderately strong exposure-response association. Substantial improvements on reducing model dimension and identifying important variables have been observed for all the five statistical methods using the two-step modeling strategy when the number of candidate variables is large. CONCLUSIONS: There is no uniform dominance of one method across all simulation scenarios and all criteria. The performances differ according to the nature of the response variable, the sample size, the number of pollutants involved, and the strength of exposure-response association/interaction. However, the two-step modeling strategy proposed here is potentially applicable under a multipollutant framework with many covariates by taking advantage of both the screening feature of an initial tree-based method and dimension reduction/variable selection property of the subsequent method. The choice of the method should also depend on the goal of the study: risk prediction, effect estimation or screening for important predictors and their interactions. BioMed Central 2013-10-04 /pmc/articles/PMC3857674/ /pubmed/24093917 http://dx.doi.org/10.1186/1476-069X-12-85 Text en Copyright © 2013 Sun et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Sun, Zhichao
Tao, Yebin
Li, Shi
Ferguson, Kelly K
Meeker, John D
Park, Sung Kyun
Batterman, Stuart A
Mukherjee, Bhramar
Statistical strategies for constructing health risk models with multiple pollutants and their interactions: possible choices and comparisons
title Statistical strategies for constructing health risk models with multiple pollutants and their interactions: possible choices and comparisons
title_full Statistical strategies for constructing health risk models with multiple pollutants and their interactions: possible choices and comparisons
title_fullStr Statistical strategies for constructing health risk models with multiple pollutants and their interactions: possible choices and comparisons
title_full_unstemmed Statistical strategies for constructing health risk models with multiple pollutants and their interactions: possible choices and comparisons
title_short Statistical strategies for constructing health risk models with multiple pollutants and their interactions: possible choices and comparisons
title_sort statistical strategies for constructing health risk models with multiple pollutants and their interactions: possible choices and comparisons
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3857674/
https://www.ncbi.nlm.nih.gov/pubmed/24093917
http://dx.doi.org/10.1186/1476-069X-12-85
work_keys_str_mv AT sunzhichao statisticalstrategiesforconstructinghealthriskmodelswithmultiplepollutantsandtheirinteractionspossiblechoicesandcomparisons
AT taoyebin statisticalstrategiesforconstructinghealthriskmodelswithmultiplepollutantsandtheirinteractionspossiblechoicesandcomparisons
AT lishi statisticalstrategiesforconstructinghealthriskmodelswithmultiplepollutantsandtheirinteractionspossiblechoicesandcomparisons
AT fergusonkellyk statisticalstrategiesforconstructinghealthriskmodelswithmultiplepollutantsandtheirinteractionspossiblechoicesandcomparisons
AT meekerjohnd statisticalstrategiesforconstructinghealthriskmodelswithmultiplepollutantsandtheirinteractionspossiblechoicesandcomparisons
AT parksungkyun statisticalstrategiesforconstructinghealthriskmodelswithmultiplepollutantsandtheirinteractionspossiblechoicesandcomparisons
AT battermanstuarta statisticalstrategiesforconstructinghealthriskmodelswithmultiplepollutantsandtheirinteractionspossiblechoicesandcomparisons
AT mukherjeebhramar statisticalstrategiesforconstructinghealthriskmodelswithmultiplepollutantsandtheirinteractionspossiblechoicesandcomparisons