Cargando…

A comparison of confounder selection and adjustment methods for estimating causal effects using large healthcare databases

PURPOSE: Confounding adjustment is required to estimate the effect of an exposure on an outcome in observational studies. However, variable selection and unmeasured confounding are particularly challenging when analyzing large healthcare data. Machine learning methods may help address these challeng...

Descripción completa

Detalles Bibliográficos
Autores principales: Benasseur, Imane, Talbot, Denis, Durand, Madeleine, Holbrook, Anne, Matteau, Alexis, Potter, Brian J., Renoux, Christel, Schnitzer, Mireille E., Tarride, Jean‐Éric, Guertin, Jason R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley & Sons, Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9304306/
https://www.ncbi.nlm.nih.gov/pubmed/34953160
http://dx.doi.org/10.1002/pds.5403
_version_ 1784752074202808320
author Benasseur, Imane
Talbot, Denis
Durand, Madeleine
Holbrook, Anne
Matteau, Alexis
Potter, Brian J.
Renoux, Christel
Schnitzer, Mireille E.
Tarride, Jean‐Éric
Guertin, Jason R.
author_facet Benasseur, Imane
Talbot, Denis
Durand, Madeleine
Holbrook, Anne
Matteau, Alexis
Potter, Brian J.
Renoux, Christel
Schnitzer, Mireille E.
Tarride, Jean‐Éric
Guertin, Jason R.
author_sort Benasseur, Imane
collection PubMed
description PURPOSE: Confounding adjustment is required to estimate the effect of an exposure on an outcome in observational studies. However, variable selection and unmeasured confounding are particularly challenging when analyzing large healthcare data. Machine learning methods may help address these challenges. The objective was to evaluate the capacity of such methods to select confounders and reduce unmeasured confounding bias. METHODS: A simulation study with known true effects was conducted. Completely synthetic and partially synthetic data incorporating real large healthcare data were generated. We compared Bayesian adjustment for confounding (BAC), generalized Bayesian causal effect estimation (GBCEE), Group Lasso and Doubly robust estimation, high‐dimensional propensity score (hdPS), and scalable collaborative targeted maximum likelihood algorithms. For the hdPS, two adjustment approaches targeting the effect in the whole population were considered: Full matching and inverse probability weighting. RESULTS: In scenarios without hidden confounders, most methods were essentially unbiased. The bias and variance of the hdPS varied considerably according to the number of variables selected by the algorithm. In scenarios with hidden confounders, substantial bias reduction was achieved by using machine‐learning methods to identify proxies as compared to adjusting only by observed confounders. hdPS and Group Lasso performed poorly in the partially synthetic simulation. BAC, GBCEE, and scalable collaborative‐targeted maximum likelihood algorithms performed particularly well. CONCLUSIONS: Machine learning can help to identify measured confounders in large healthcare databases. They can also capitalize on proxies of unmeasured confounders to substantially reduce residual confounding bias.
format Online
Article
Text
id pubmed-9304306
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher John Wiley & Sons, Inc.
record_format MEDLINE/PubMed
spelling pubmed-93043062022-07-28 A comparison of confounder selection and adjustment methods for estimating causal effects using large healthcare databases Benasseur, Imane Talbot, Denis Durand, Madeleine Holbrook, Anne Matteau, Alexis Potter, Brian J. Renoux, Christel Schnitzer, Mireille E. Tarride, Jean‐Éric Guertin, Jason R. Pharmacoepidemiol Drug Saf Original Articles PURPOSE: Confounding adjustment is required to estimate the effect of an exposure on an outcome in observational studies. However, variable selection and unmeasured confounding are particularly challenging when analyzing large healthcare data. Machine learning methods may help address these challenges. The objective was to evaluate the capacity of such methods to select confounders and reduce unmeasured confounding bias. METHODS: A simulation study with known true effects was conducted. Completely synthetic and partially synthetic data incorporating real large healthcare data were generated. We compared Bayesian adjustment for confounding (BAC), generalized Bayesian causal effect estimation (GBCEE), Group Lasso and Doubly robust estimation, high‐dimensional propensity score (hdPS), and scalable collaborative targeted maximum likelihood algorithms. For the hdPS, two adjustment approaches targeting the effect in the whole population were considered: Full matching and inverse probability weighting. RESULTS: In scenarios without hidden confounders, most methods were essentially unbiased. The bias and variance of the hdPS varied considerably according to the number of variables selected by the algorithm. In scenarios with hidden confounders, substantial bias reduction was achieved by using machine‐learning methods to identify proxies as compared to adjusting only by observed confounders. hdPS and Group Lasso performed poorly in the partially synthetic simulation. BAC, GBCEE, and scalable collaborative‐targeted maximum likelihood algorithms performed particularly well. CONCLUSIONS: Machine learning can help to identify measured confounders in large healthcare databases. They can also capitalize on proxies of unmeasured confounders to substantially reduce residual confounding bias. John Wiley & Sons, Inc. 2022-01-07 2022-04 /pmc/articles/PMC9304306/ /pubmed/34953160 http://dx.doi.org/10.1002/pds.5403 Text en © 2021 The Authors. Pharmacoepidemiology and Drug Safety published by John Wiley & Sons Ltd. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc-nd/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.
spellingShingle Original Articles
Benasseur, Imane
Talbot, Denis
Durand, Madeleine
Holbrook, Anne
Matteau, Alexis
Potter, Brian J.
Renoux, Christel
Schnitzer, Mireille E.
Tarride, Jean‐Éric
Guertin, Jason R.
A comparison of confounder selection and adjustment methods for estimating causal effects using large healthcare databases
title A comparison of confounder selection and adjustment methods for estimating causal effects using large healthcare databases
title_full A comparison of confounder selection and adjustment methods for estimating causal effects using large healthcare databases
title_fullStr A comparison of confounder selection and adjustment methods for estimating causal effects using large healthcare databases
title_full_unstemmed A comparison of confounder selection and adjustment methods for estimating causal effects using large healthcare databases
title_short A comparison of confounder selection and adjustment methods for estimating causal effects using large healthcare databases
title_sort comparison of confounder selection and adjustment methods for estimating causal effects using large healthcare databases
topic Original Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9304306/
https://www.ncbi.nlm.nih.gov/pubmed/34953160
http://dx.doi.org/10.1002/pds.5403
work_keys_str_mv AT benasseurimane acomparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases
AT talbotdenis acomparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases
AT durandmadeleine acomparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases
AT holbrookanne acomparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases
AT matteaualexis acomparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases
AT potterbrianj acomparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases
AT renouxchristel acomparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases
AT schnitzermireillee acomparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases
AT tarridejeaneric acomparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases
AT guertinjasonr acomparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases
AT benasseurimane comparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases
AT talbotdenis comparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases
AT durandmadeleine comparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases
AT holbrookanne comparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases
AT matteaualexis comparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases
AT potterbrianj comparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases
AT renouxchristel comparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases
AT schnitzermireillee comparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases
AT tarridejeaneric comparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases
AT guertinjasonr comparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases