Cargando…

Invited Commentary: Demystifying Statistical Inference When Using Machine Learning in Causal Research

In this issue, Naimi et al. (Am J Epidemiol. 2023;192(9):1536–1544) discuss a critical topic in public health and beyond: obtaining valid statistical inference when using machine learning in causal research. In doing so, the authors review recent prominent methodological work and recommend: 1) doubl...

Descripción completa

Detalles Bibliográficos
Autores principales:	Balzer, Laura B, Westling, Ted
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2021
Materias:	Invited Commentary
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10472326/ https://www.ncbi.nlm.nih.gov/pubmed/34268553 http://dx.doi.org/10.1093/aje/kwab200

_version_	1785100051956105216
author	Balzer, Laura B Westling, Ted
author_facet	Balzer, Laura B Westling, Ted
author_sort	Balzer, Laura B
collection	PubMed
description	In this issue, Naimi et al. (Am J Epidemiol. 2023;192(9):1536–1544) discuss a critical topic in public health and beyond: obtaining valid statistical inference when using machine learning in causal research. In doing so, the authors review recent prominent methodological work and recommend: 1) doubly robust estimators, such as targeted maximum likelihood estimation (TMLE); 2) ensemble methods, such as Super Learner, to combine predictions from a diverse library of algorithms; and 3) sample splitting to reduce bias and improve inference. We largely agree with these recommendations. In this commentary, we highlight the critical importance of the Super Learner library. Specifically, in both simulation settings considered by the authors, we demonstrate that reductions in bias and improvements in confidence-interval coverage can be achieved using TMLE without sample splitting and with a Super Learner library that excludes tree-based methods but includes regression splines. Whether extremely data-adaptive algorithms and sample splitting are needed depends on the specific problem and should be informed by simulations reflecting the specific application. More research is needed on practical recommendations for selecting among these options in common situations arising in epidemiology.
format	Online Article Text
id	pubmed-10472326
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-104723262023-09-02 Invited Commentary: Demystifying Statistical Inference When Using Machine Learning in Causal Research Balzer, Laura B Westling, Ted Am J Epidemiol Invited Commentary In this issue, Naimi et al. (Am J Epidemiol. 2023;192(9):1536–1544) discuss a critical topic in public health and beyond: obtaining valid statistical inference when using machine learning in causal research. In doing so, the authors review recent prominent methodological work and recommend: 1) doubly robust estimators, such as targeted maximum likelihood estimation (TMLE); 2) ensemble methods, such as Super Learner, to combine predictions from a diverse library of algorithms; and 3) sample splitting to reduce bias and improve inference. We largely agree with these recommendations. In this commentary, we highlight the critical importance of the Super Learner library. Specifically, in both simulation settings considered by the authors, we demonstrate that reductions in bias and improvements in confidence-interval coverage can be achieved using TMLE without sample splitting and with a Super Learner library that excludes tree-based methods but includes regression splines. Whether extremely data-adaptive algorithms and sample splitting are needed depends on the specific problem and should be informed by simulations reflecting the specific application. More research is needed on practical recommendations for selecting among these options in common situations arising in epidemiology. Oxford University Press 2021-07-15 /pmc/articles/PMC10472326/ /pubmed/34268553 http://dx.doi.org/10.1093/aje/kwab200 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Invited Commentary Balzer, Laura B Westling, Ted Invited Commentary: Demystifying Statistical Inference When Using Machine Learning in Causal Research
title	Invited Commentary: Demystifying Statistical Inference When Using Machine Learning in Causal Research
title_full	Invited Commentary: Demystifying Statistical Inference When Using Machine Learning in Causal Research
title_fullStr	Invited Commentary: Demystifying Statistical Inference When Using Machine Learning in Causal Research
title_full_unstemmed	Invited Commentary: Demystifying Statistical Inference When Using Machine Learning in Causal Research
title_short	Invited Commentary: Demystifying Statistical Inference When Using Machine Learning in Causal Research
title_sort	invited commentary: demystifying statistical inference when using machine learning in causal research
topic	Invited Commentary
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10472326/ https://www.ncbi.nlm.nih.gov/pubmed/34268553 http://dx.doi.org/10.1093/aje/kwab200
work_keys_str_mv	AT balzerlaurab invitedcommentarydemystifyingstatisticalinferencewhenusingmachinelearningincausalresearch AT westlingted invitedcommentarydemystifyingstatisticalinferencewhenusingmachinelearningincausalresearch

Invited Commentary: Demystifying Statistical Inference When Using Machine Learning in Causal Research

Ejemplares similares