Cargando…

External control arm analysis: an evaluation of propensity score approaches, G-computation, and doubly debiased machine learning

BACKGROUND: An external control arm is a cohort of control patients that are collected from data external to a single-arm trial. To provide an unbiased estimation of efficacy, the clinical profiles of patients from single and external arms should be aligned, typically using propensity score approach...

Descripción completa

Detalles Bibliográficos
Autores principales: Loiseau, Nicolas, Trichelair, Paul, He, Maxime, Andreux, Mathieu, Zaslavskiy, Mikhail, Wainrib, Gilles, Blum, Michael G. B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9795588/
https://www.ncbi.nlm.nih.gov/pubmed/36577946
http://dx.doi.org/10.1186/s12874-022-01799-z
_version_ 1784860294636371968
author Loiseau, Nicolas
Trichelair, Paul
He, Maxime
Andreux, Mathieu
Zaslavskiy, Mikhail
Wainrib, Gilles
Blum, Michael G. B.
author_facet Loiseau, Nicolas
Trichelair, Paul
He, Maxime
Andreux, Mathieu
Zaslavskiy, Mikhail
Wainrib, Gilles
Blum, Michael G. B.
author_sort Loiseau, Nicolas
collection PubMed
description BACKGROUND: An external control arm is a cohort of control patients that are collected from data external to a single-arm trial. To provide an unbiased estimation of efficacy, the clinical profiles of patients from single and external arms should be aligned, typically using propensity score approaches. There are alternative approaches to infer efficacy based on comparisons between outcomes of single-arm patients and machine-learning predictions of control patient outcomes. These methods include G-computation and Doubly Debiased Machine Learning (DDML) and their evaluation for External Control Arms (ECA) analysis is insufficient. METHODS: We consider both numerical simulations and a trial replication procedure to evaluate the different statistical approaches: propensity score matching, Inverse Probability of Treatment Weighting (IPTW), G-computation, and DDML. The replication study relies on five type 2 diabetes randomized clinical trials granted by the Yale University Open Data Access (YODA) project. From the pool of five trials, observational experiments are artificially built by replacing a control arm from one trial by an arm originating from another trial and containing similarly-treated patients. RESULTS: Among the different statistical approaches, numerical simulations show that DDML has the smallest bias followed by G-computation. In terms of mean squared error, G-computation usually minimizes mean squared error. Compared to other methods, DDML has varying Mean Squared Error performances that improves with increasing sample sizes. For hypothesis testing, all methods control type I error and DDML is the most conservative. G-computation is the best method in terms of statistical power, and DDML has comparable power at [Formula: see text] but inferior ones for smaller sample sizes. The replication procedure also indicates that G-computation minimizes mean squared error whereas DDML has intermediate performances in between G-computation and propensity score approaches. The confidence intervals of G-computation are the narrowest whereas confidence intervals obtained with DDML are the widest for small sample sizes, which confirms its conservative nature. CONCLUSIONS: For external control arm analyses, methods based on outcome prediction models can reduce estimation error and increase statistical power compared to propensity score approaches.
format Online
Article
Text
id pubmed-9795588
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-97955882022-12-29 External control arm analysis: an evaluation of propensity score approaches, G-computation, and doubly debiased machine learning Loiseau, Nicolas Trichelair, Paul He, Maxime Andreux, Mathieu Zaslavskiy, Mikhail Wainrib, Gilles Blum, Michael G. B. BMC Med Res Methodol Research Article BACKGROUND: An external control arm is a cohort of control patients that are collected from data external to a single-arm trial. To provide an unbiased estimation of efficacy, the clinical profiles of patients from single and external arms should be aligned, typically using propensity score approaches. There are alternative approaches to infer efficacy based on comparisons between outcomes of single-arm patients and machine-learning predictions of control patient outcomes. These methods include G-computation and Doubly Debiased Machine Learning (DDML) and their evaluation for External Control Arms (ECA) analysis is insufficient. METHODS: We consider both numerical simulations and a trial replication procedure to evaluate the different statistical approaches: propensity score matching, Inverse Probability of Treatment Weighting (IPTW), G-computation, and DDML. The replication study relies on five type 2 diabetes randomized clinical trials granted by the Yale University Open Data Access (YODA) project. From the pool of five trials, observational experiments are artificially built by replacing a control arm from one trial by an arm originating from another trial and containing similarly-treated patients. RESULTS: Among the different statistical approaches, numerical simulations show that DDML has the smallest bias followed by G-computation. In terms of mean squared error, G-computation usually minimizes mean squared error. Compared to other methods, DDML has varying Mean Squared Error performances that improves with increasing sample sizes. For hypothesis testing, all methods control type I error and DDML is the most conservative. G-computation is the best method in terms of statistical power, and DDML has comparable power at [Formula: see text] but inferior ones for smaller sample sizes. The replication procedure also indicates that G-computation minimizes mean squared error whereas DDML has intermediate performances in between G-computation and propensity score approaches. The confidence intervals of G-computation are the narrowest whereas confidence intervals obtained with DDML are the widest for small sample sizes, which confirms its conservative nature. CONCLUSIONS: For external control arm analyses, methods based on outcome prediction models can reduce estimation error and increase statistical power compared to propensity score approaches. BioMed Central 2022-12-28 /pmc/articles/PMC9795588/ /pubmed/36577946 http://dx.doi.org/10.1186/s12874-022-01799-z Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Loiseau, Nicolas
Trichelair, Paul
He, Maxime
Andreux, Mathieu
Zaslavskiy, Mikhail
Wainrib, Gilles
Blum, Michael G. B.
External control arm analysis: an evaluation of propensity score approaches, G-computation, and doubly debiased machine learning
title External control arm analysis: an evaluation of propensity score approaches, G-computation, and doubly debiased machine learning
title_full External control arm analysis: an evaluation of propensity score approaches, G-computation, and doubly debiased machine learning
title_fullStr External control arm analysis: an evaluation of propensity score approaches, G-computation, and doubly debiased machine learning
title_full_unstemmed External control arm analysis: an evaluation of propensity score approaches, G-computation, and doubly debiased machine learning
title_short External control arm analysis: an evaluation of propensity score approaches, G-computation, and doubly debiased machine learning
title_sort external control arm analysis: an evaluation of propensity score approaches, g-computation, and doubly debiased machine learning
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9795588/
https://www.ncbi.nlm.nih.gov/pubmed/36577946
http://dx.doi.org/10.1186/s12874-022-01799-z
work_keys_str_mv AT loiseaunicolas externalcontrolarmanalysisanevaluationofpropensityscoreapproachesgcomputationanddoublydebiasedmachinelearning
AT trichelairpaul externalcontrolarmanalysisanevaluationofpropensityscoreapproachesgcomputationanddoublydebiasedmachinelearning
AT hemaxime externalcontrolarmanalysisanevaluationofpropensityscoreapproachesgcomputationanddoublydebiasedmachinelearning
AT andreuxmathieu externalcontrolarmanalysisanevaluationofpropensityscoreapproachesgcomputationanddoublydebiasedmachinelearning
AT zaslavskiymikhail externalcontrolarmanalysisanevaluationofpropensityscoreapproachesgcomputationanddoublydebiasedmachinelearning
AT wainribgilles externalcontrolarmanalysisanevaluationofpropensityscoreapproachesgcomputationanddoublydebiasedmachinelearning
AT blummichaelgb externalcontrolarmanalysisanevaluationofpropensityscoreapproachesgcomputationanddoublydebiasedmachinelearning