Table 2 Fallacy in Descriptive Epidemiology: Bringing Machine Learning to the Table

There is a lack of rigorous methodological development for descriptive epidemiology, where the goal is to describe and identify the most important associations with an outcome given a large set of potential predictors. This has often led to the Table 2 fallacy, where one presents the coefficient est...

Descripción completa

Detalles Bibliográficos
Autores principales: Dharma, Christoffer, Fu, Rui, Chaiton, Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10340623/
https://www.ncbi.nlm.nih.gov/pubmed/37444042
http://dx.doi.org/10.3390/ijerph20136194
_version_ 1785072123570552832
author Dharma, Christoffer
Fu, Rui
Chaiton, Michael
author_facet Dharma, Christoffer
Fu, Rui
Chaiton, Michael
author_sort Dharma, Christoffer
collection PubMed
description There is a lack of rigorous methodological development for descriptive epidemiology, where the goal is to describe and identify the most important associations with an outcome given a large set of potential predictors. This has often led to the Table 2 fallacy, where one presents the coefficient estimates for all covariates from a single multivariable regression model, which are often uninterpretable in a descriptive analysis. We argue that machine learning (ML) is a potential solution to this problem. We illustrate the power of ML with an example analysis identifying the most important predictors of alcohol abuse among sexual minority youth. The framework we propose for this analysis is as follows: (1) Identify a few ML methods for the analysis, (2) optimize the parameters using the whole data with a nested cross-validation approach, (3) rank the variables using variable importance scores, (4) present partial dependence plots (PDP) to illustrate the association between the important variables and the outcome, (5) and identify the strength of the interaction terms using the PDPs. We discuss the potential strengths and weaknesses of using ML methods for descriptive analysis and future directions for research. R codes to reproduce these analyses are provided, which we invite other researchers to use.
format Online
Article
Text
id pubmed-10340623
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-103406232023-07-14 Table 2 Fallacy in Descriptive Epidemiology: Bringing Machine Learning to the Table Dharma, Christoffer Fu, Rui Chaiton, Michael Int J Environ Res Public Health Article There is a lack of rigorous methodological development for descriptive epidemiology, where the goal is to describe and identify the most important associations with an outcome given a large set of potential predictors. This has often led to the Table 2 fallacy, where one presents the coefficient estimates for all covariates from a single multivariable regression model, which are often uninterpretable in a descriptive analysis. We argue that machine learning (ML) is a potential solution to this problem. We illustrate the power of ML with an example analysis identifying the most important predictors of alcohol abuse among sexual minority youth. The framework we propose for this analysis is as follows: (1) Identify a few ML methods for the analysis, (2) optimize the parameters using the whole data with a nested cross-validation approach, (3) rank the variables using variable importance scores, (4) present partial dependence plots (PDP) to illustrate the association between the important variables and the outcome, (5) and identify the strength of the interaction terms using the PDPs. We discuss the potential strengths and weaknesses of using ML methods for descriptive analysis and future directions for research. R codes to reproduce these analyses are provided, which we invite other researchers to use. MDPI 2023-06-21 /pmc/articles/PMC10340623/ /pubmed/37444042 http://dx.doi.org/10.3390/ijerph20136194 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Dharma, Christoffer
Fu, Rui
Chaiton, Michael
Table 2 Fallacy in Descriptive Epidemiology: Bringing Machine Learning to the Table
title Table 2 Fallacy in Descriptive Epidemiology: Bringing Machine Learning to the Table
title_full Table 2 Fallacy in Descriptive Epidemiology: Bringing Machine Learning to the Table
title_fullStr Table 2 Fallacy in Descriptive Epidemiology: Bringing Machine Learning to the Table
title_full_unstemmed Table 2 Fallacy in Descriptive Epidemiology: Bringing Machine Learning to the Table
title_short Table 2 Fallacy in Descriptive Epidemiology: Bringing Machine Learning to the Table
title_sort table 2 fallacy in descriptive epidemiology: bringing machine learning to the table
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10340623/
https://www.ncbi.nlm.nih.gov/pubmed/37444042
http://dx.doi.org/10.3390/ijerph20136194
work_keys_str_mv AT dharmachristoffer table2fallacyindescriptiveepidemiologybringingmachinelearningtothetable
AT furui table2fallacyindescriptiveepidemiologybringingmachinelearningtothetable
AT chaitonmichael table2fallacyindescriptiveepidemiologybringingmachinelearningtothetable