Table 2 Fallacy in Descriptive Epidemiology: Bringing Machine Learning to the Table
There is a lack of rigorous methodological development for descriptive epidemiology, where the goal is to describe and identify the most important associations with an outcome given a large set of potential predictors. This has often led to the Table 2 fallacy, where one presents the coefficient est...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10340623/ https://www.ncbi.nlm.nih.gov/pubmed/37444042 http://dx.doi.org/10.3390/ijerph20136194 |
_version_ | 1785072123570552832 |
---|---|
author | Dharma, Christoffer Fu, Rui Chaiton, Michael |
author_facet | Dharma, Christoffer Fu, Rui Chaiton, Michael |
author_sort | Dharma, Christoffer |
collection | PubMed |
description | There is a lack of rigorous methodological development for descriptive epidemiology, where the goal is to describe and identify the most important associations with an outcome given a large set of potential predictors. This has often led to the Table 2 fallacy, where one presents the coefficient estimates for all covariates from a single multivariable regression model, which are often uninterpretable in a descriptive analysis. We argue that machine learning (ML) is a potential solution to this problem. We illustrate the power of ML with an example analysis identifying the most important predictors of alcohol abuse among sexual minority youth. The framework we propose for this analysis is as follows: (1) Identify a few ML methods for the analysis, (2) optimize the parameters using the whole data with a nested cross-validation approach, (3) rank the variables using variable importance scores, (4) present partial dependence plots (PDP) to illustrate the association between the important variables and the outcome, (5) and identify the strength of the interaction terms using the PDPs. We discuss the potential strengths and weaknesses of using ML methods for descriptive analysis and future directions for research. R codes to reproduce these analyses are provided, which we invite other researchers to use. |
format | Online Article Text |
id | pubmed-10340623 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-103406232023-07-14 Table 2 Fallacy in Descriptive Epidemiology: Bringing Machine Learning to the Table Dharma, Christoffer Fu, Rui Chaiton, Michael Int J Environ Res Public Health Article There is a lack of rigorous methodological development for descriptive epidemiology, where the goal is to describe and identify the most important associations with an outcome given a large set of potential predictors. This has often led to the Table 2 fallacy, where one presents the coefficient estimates for all covariates from a single multivariable regression model, which are often uninterpretable in a descriptive analysis. We argue that machine learning (ML) is a potential solution to this problem. We illustrate the power of ML with an example analysis identifying the most important predictors of alcohol abuse among sexual minority youth. The framework we propose for this analysis is as follows: (1) Identify a few ML methods for the analysis, (2) optimize the parameters using the whole data with a nested cross-validation approach, (3) rank the variables using variable importance scores, (4) present partial dependence plots (PDP) to illustrate the association between the important variables and the outcome, (5) and identify the strength of the interaction terms using the PDPs. We discuss the potential strengths and weaknesses of using ML methods for descriptive analysis and future directions for research. R codes to reproduce these analyses are provided, which we invite other researchers to use. MDPI 2023-06-21 /pmc/articles/PMC10340623/ /pubmed/37444042 http://dx.doi.org/10.3390/ijerph20136194 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Dharma, Christoffer Fu, Rui Chaiton, Michael Table 2 Fallacy in Descriptive Epidemiology: Bringing Machine Learning to the Table |
title | Table 2 Fallacy in Descriptive Epidemiology: Bringing Machine Learning to the Table |
title_full | Table 2 Fallacy in Descriptive Epidemiology: Bringing Machine Learning to the Table |
title_fullStr | Table 2 Fallacy in Descriptive Epidemiology: Bringing Machine Learning to the Table |
title_full_unstemmed | Table 2 Fallacy in Descriptive Epidemiology: Bringing Machine Learning to the Table |
title_short | Table 2 Fallacy in Descriptive Epidemiology: Bringing Machine Learning to the Table |
title_sort | table 2 fallacy in descriptive epidemiology: bringing machine learning to the table |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10340623/ https://www.ncbi.nlm.nih.gov/pubmed/37444042 http://dx.doi.org/10.3390/ijerph20136194 |
work_keys_str_mv | AT dharmachristoffer table2fallacyindescriptiveepidemiologybringingmachinelearningtothetable AT furui table2fallacyindescriptiveepidemiologybringingmachinelearningtothetable AT chaitonmichael table2fallacyindescriptiveepidemiologybringingmachinelearningtothetable |