Cargando…
A comparison of confounder selection and adjustment methods for estimating causal effects using large healthcare databases
PURPOSE: Confounding adjustment is required to estimate the effect of an exposure on an outcome in observational studies. However, variable selection and unmeasured confounding are particularly challenging when analyzing large healthcare data. Machine learning methods may help address these challeng...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
John Wiley & Sons, Inc.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9304306/ https://www.ncbi.nlm.nih.gov/pubmed/34953160 http://dx.doi.org/10.1002/pds.5403 |
_version_ | 1784752074202808320 |
---|---|
author | Benasseur, Imane Talbot, Denis Durand, Madeleine Holbrook, Anne Matteau, Alexis Potter, Brian J. Renoux, Christel Schnitzer, Mireille E. Tarride, Jean‐Éric Guertin, Jason R. |
author_facet | Benasseur, Imane Talbot, Denis Durand, Madeleine Holbrook, Anne Matteau, Alexis Potter, Brian J. Renoux, Christel Schnitzer, Mireille E. Tarride, Jean‐Éric Guertin, Jason R. |
author_sort | Benasseur, Imane |
collection | PubMed |
description | PURPOSE: Confounding adjustment is required to estimate the effect of an exposure on an outcome in observational studies. However, variable selection and unmeasured confounding are particularly challenging when analyzing large healthcare data. Machine learning methods may help address these challenges. The objective was to evaluate the capacity of such methods to select confounders and reduce unmeasured confounding bias. METHODS: A simulation study with known true effects was conducted. Completely synthetic and partially synthetic data incorporating real large healthcare data were generated. We compared Bayesian adjustment for confounding (BAC), generalized Bayesian causal effect estimation (GBCEE), Group Lasso and Doubly robust estimation, high‐dimensional propensity score (hdPS), and scalable collaborative targeted maximum likelihood algorithms. For the hdPS, two adjustment approaches targeting the effect in the whole population were considered: Full matching and inverse probability weighting. RESULTS: In scenarios without hidden confounders, most methods were essentially unbiased. The bias and variance of the hdPS varied considerably according to the number of variables selected by the algorithm. In scenarios with hidden confounders, substantial bias reduction was achieved by using machine‐learning methods to identify proxies as compared to adjusting only by observed confounders. hdPS and Group Lasso performed poorly in the partially synthetic simulation. BAC, GBCEE, and scalable collaborative‐targeted maximum likelihood algorithms performed particularly well. CONCLUSIONS: Machine learning can help to identify measured confounders in large healthcare databases. They can also capitalize on proxies of unmeasured confounders to substantially reduce residual confounding bias. |
format | Online Article Text |
id | pubmed-9304306 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | John Wiley & Sons, Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-93043062022-07-28 A comparison of confounder selection and adjustment methods for estimating causal effects using large healthcare databases Benasseur, Imane Talbot, Denis Durand, Madeleine Holbrook, Anne Matteau, Alexis Potter, Brian J. Renoux, Christel Schnitzer, Mireille E. Tarride, Jean‐Éric Guertin, Jason R. Pharmacoepidemiol Drug Saf Original Articles PURPOSE: Confounding adjustment is required to estimate the effect of an exposure on an outcome in observational studies. However, variable selection and unmeasured confounding are particularly challenging when analyzing large healthcare data. Machine learning methods may help address these challenges. The objective was to evaluate the capacity of such methods to select confounders and reduce unmeasured confounding bias. METHODS: A simulation study with known true effects was conducted. Completely synthetic and partially synthetic data incorporating real large healthcare data were generated. We compared Bayesian adjustment for confounding (BAC), generalized Bayesian causal effect estimation (GBCEE), Group Lasso and Doubly robust estimation, high‐dimensional propensity score (hdPS), and scalable collaborative targeted maximum likelihood algorithms. For the hdPS, two adjustment approaches targeting the effect in the whole population were considered: Full matching and inverse probability weighting. RESULTS: In scenarios without hidden confounders, most methods were essentially unbiased. The bias and variance of the hdPS varied considerably according to the number of variables selected by the algorithm. In scenarios with hidden confounders, substantial bias reduction was achieved by using machine‐learning methods to identify proxies as compared to adjusting only by observed confounders. hdPS and Group Lasso performed poorly in the partially synthetic simulation. BAC, GBCEE, and scalable collaborative‐targeted maximum likelihood algorithms performed particularly well. CONCLUSIONS: Machine learning can help to identify measured confounders in large healthcare databases. They can also capitalize on proxies of unmeasured confounders to substantially reduce residual confounding bias. John Wiley & Sons, Inc. 2022-01-07 2022-04 /pmc/articles/PMC9304306/ /pubmed/34953160 http://dx.doi.org/10.1002/pds.5403 Text en © 2021 The Authors. Pharmacoepidemiology and Drug Safety published by John Wiley & Sons Ltd. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc-nd/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made. |
spellingShingle | Original Articles Benasseur, Imane Talbot, Denis Durand, Madeleine Holbrook, Anne Matteau, Alexis Potter, Brian J. Renoux, Christel Schnitzer, Mireille E. Tarride, Jean‐Éric Guertin, Jason R. A comparison of confounder selection and adjustment methods for estimating causal effects using large healthcare databases |
title | A comparison of confounder selection and adjustment methods for estimating causal effects using large healthcare databases |
title_full | A comparison of confounder selection and adjustment methods for estimating causal effects using large healthcare databases |
title_fullStr | A comparison of confounder selection and adjustment methods for estimating causal effects using large healthcare databases |
title_full_unstemmed | A comparison of confounder selection and adjustment methods for estimating causal effects using large healthcare databases |
title_short | A comparison of confounder selection and adjustment methods for estimating causal effects using large healthcare databases |
title_sort | comparison of confounder selection and adjustment methods for estimating causal effects using large healthcare databases |
topic | Original Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9304306/ https://www.ncbi.nlm.nih.gov/pubmed/34953160 http://dx.doi.org/10.1002/pds.5403 |
work_keys_str_mv | AT benasseurimane acomparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases AT talbotdenis acomparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases AT durandmadeleine acomparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases AT holbrookanne acomparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases AT matteaualexis acomparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases AT potterbrianj acomparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases AT renouxchristel acomparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases AT schnitzermireillee acomparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases AT tarridejeaneric acomparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases AT guertinjasonr acomparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases AT benasseurimane comparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases AT talbotdenis comparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases AT durandmadeleine comparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases AT holbrookanne comparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases AT matteaualexis comparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases AT potterbrianj comparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases AT renouxchristel comparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases AT schnitzermireillee comparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases AT tarridejeaneric comparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases AT guertinjasonr comparisonofconfounderselectionandadjustmentmethodsforestimatingcausaleffectsusinglargehealthcaredatabases |