Cargando…

CVtreeMLE: Efficient Estimation of Mixed Exposures using Data Adaptive Decision Trees and Cross-Validated Targeted Maximum Likelihood Estimation in R

Statistical causal inference of mixed exposures has been limited by reliance on parametric models and, until recently, by researchers considering only one exposure at a time, usually estimated as a beta coefficient in a generalized linear regression model (GLM). This independent assessment of exposu...

Descripción completa

Detalles Bibliográficos
Autores principales:	McCoy, David, Hubbard, Alan, Van der Laan, Mark
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10312067/ https://www.ncbi.nlm.nih.gov/pubmed/37398941 http://dx.doi.org/10.21105/joss.04181

_version_	1785066876717498368
author	McCoy, David Hubbard, Alan Van der Laan, Mark
author_facet	McCoy, David Hubbard, Alan Van der Laan, Mark
author_sort	McCoy, David
collection	PubMed
description	Statistical causal inference of mixed exposures has been limited by reliance on parametric models and, until recently, by researchers considering only one exposure at a time, usually estimated as a beta coefficient in a generalized linear regression model (GLM). This independent assessment of exposures poorly estimates the joint impact of a collection of the same exposures in a realistic exposure setting. Marginal methods for mixture variable selection such as ridge/lasso regression are biased by linear assumptions and the interactions modeled are chosen by the user. Clustering methods such as principal component regression lose both interpretability and valid inference. Newer mixture methods such as quantile g-computation (Keil et al., 2020) are biased by linear/additive assumptions. More flexible methods such as Bayesian kernel machine regression (BKMR)(Bobb et al., 2014) are sensitive to the choice of tuning parameters, are computationally taxing and lack an interpretable and robust summary statistic of dose-response relationships. No methods currently exist which finds the best flexible model to adjust for covariates while applying a non-parametric model that targets for interactions in a mixture and delivers valid inference for a target parameter. Non-parametric methods such as decision trees are a useful tool to evaluate combined exposures by finding partitions in the joint-exposure (mixture) space that best explain the variance in an outcome. However, current methods using decision trees to assess statistical inference for interactions are biased and are prone to overfitting by using the full data to both identify nodes in the tree and make statistical inference given these nodes. Other methods have used an independent test set to derive inference which does not use the full data. The CVtreeMLE R package provides researchers in (bio)statistics, epidemiology, and environmental health sciences with access to state-of-the-art statistical methodology for evaluating the causal effects of a data-adaptively determined mixed exposure using decision trees. Our target audience are those analysts who would normally use a potentially biased GLM based model for a mixed exposure. Instead, we hope to provide users with a non-parametric statistical machine where users simply specify the exposures, covariates and outcome, CVtreeMLE then determines if a best fitting decision tree exists and delivers interpretable results.
format	Online Article Text
id	pubmed-10312067
institution	National Center for Biotechnology Information
language	English
publishDate	2023
record_format	MEDLINE/PubMed
spelling	pubmed-103120672023-06-30 CVtreeMLE: Efficient Estimation of Mixed Exposures using Data Adaptive Decision Trees and Cross-Validated Targeted Maximum Likelihood Estimation in R McCoy, David Hubbard, Alan Van der Laan, Mark J Open Source Softw Article Statistical causal inference of mixed exposures has been limited by reliance on parametric models and, until recently, by researchers considering only one exposure at a time, usually estimated as a beta coefficient in a generalized linear regression model (GLM). This independent assessment of exposures poorly estimates the joint impact of a collection of the same exposures in a realistic exposure setting. Marginal methods for mixture variable selection such as ridge/lasso regression are biased by linear assumptions and the interactions modeled are chosen by the user. Clustering methods such as principal component regression lose both interpretability and valid inference. Newer mixture methods such as quantile g-computation (Keil et al., 2020) are biased by linear/additive assumptions. More flexible methods such as Bayesian kernel machine regression (BKMR)(Bobb et al., 2014) are sensitive to the choice of tuning parameters, are computationally taxing and lack an interpretable and robust summary statistic of dose-response relationships. No methods currently exist which finds the best flexible model to adjust for covariates while applying a non-parametric model that targets for interactions in a mixture and delivers valid inference for a target parameter. Non-parametric methods such as decision trees are a useful tool to evaluate combined exposures by finding partitions in the joint-exposure (mixture) space that best explain the variance in an outcome. However, current methods using decision trees to assess statistical inference for interactions are biased and are prone to overfitting by using the full data to both identify nodes in the tree and make statistical inference given these nodes. Other methods have used an independent test set to derive inference which does not use the full data. The CVtreeMLE R package provides researchers in (bio)statistics, epidemiology, and environmental health sciences with access to state-of-the-art statistical methodology for evaluating the causal effects of a data-adaptively determined mixed exposure using decision trees. Our target audience are those analysts who would normally use a potentially biased GLM based model for a mixed exposure. Instead, we hope to provide users with a non-parametric statistical machine where users simply specify the exposures, covariates and outcome, CVtreeMLE then determines if a best fitting decision tree exists and delivers interpretable results. 2023 2023-02-21 /pmc/articles/PMC10312067/ /pubmed/37398941 http://dx.doi.org/10.21105/joss.04181 Text en https://creativecommons.org/licenses/by/4.0/License Authors of papers retain copyright and release the work under a Creative Commons Attribution 4.0 International License (CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/) ).
spellingShingle	Article McCoy, David Hubbard, Alan Van der Laan, Mark CVtreeMLE: Efficient Estimation of Mixed Exposures using Data Adaptive Decision Trees and Cross-Validated Targeted Maximum Likelihood Estimation in R
title	CVtreeMLE: Efficient Estimation of Mixed Exposures using Data Adaptive Decision Trees and Cross-Validated Targeted Maximum Likelihood Estimation in R
title_full	CVtreeMLE: Efficient Estimation of Mixed Exposures using Data Adaptive Decision Trees and Cross-Validated Targeted Maximum Likelihood Estimation in R
title_fullStr	CVtreeMLE: Efficient Estimation of Mixed Exposures using Data Adaptive Decision Trees and Cross-Validated Targeted Maximum Likelihood Estimation in R
title_full_unstemmed	CVtreeMLE: Efficient Estimation of Mixed Exposures using Data Adaptive Decision Trees and Cross-Validated Targeted Maximum Likelihood Estimation in R
title_short	CVtreeMLE: Efficient Estimation of Mixed Exposures using Data Adaptive Decision Trees and Cross-Validated Targeted Maximum Likelihood Estimation in R
title_sort	cvtreemle: efficient estimation of mixed exposures using data adaptive decision trees and cross-validated targeted maximum likelihood estimation in r
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10312067/ https://www.ncbi.nlm.nih.gov/pubmed/37398941 http://dx.doi.org/10.21105/joss.04181
work_keys_str_mv	AT mccoydavid cvtreemleefficientestimationofmixedexposuresusingdataadaptivedecisiontreesandcrossvalidatedtargetedmaximumlikelihoodestimationinr AT hubbardalan cvtreemleefficientestimationofmixedexposuresusingdataadaptivedecisiontreesandcrossvalidatedtargetedmaximumlikelihoodestimationinr AT vanderlaanmark cvtreemleefficientestimationofmixedexposuresusingdataadaptivedecisiontreesandcrossvalidatedtargetedmaximumlikelihoodestimationinr

CVtreeMLE: Efficient Estimation of Mixed Exposures using Data Adaptive Decision Trees and Cross-Validated Targeted Maximum Likelihood Estimation in R

Ejemplares similares