Cargando…
Don’t dismiss logistic regression: the case for sensible extraction of interactions in the era of machine learning
BACKGROUND: Machine learning approaches have become increasingly popular modeling techniques, relying on data-driven heuristics to arrive at its solutions. Recent comparisons between these algorithms and traditional statistical modeling techniques have largely ignored the superiority gained by the f...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7325087/ https://www.ncbi.nlm.nih.gov/pubmed/32600277 http://dx.doi.org/10.1186/s12874-020-01046-3 |
_version_ | 1783552087073751040 |
---|---|
author | Levy, Joshua J. O’Malley, A. James |
author_facet | Levy, Joshua J. O’Malley, A. James |
author_sort | Levy, Joshua J. |
collection | PubMed |
description | BACKGROUND: Machine learning approaches have become increasingly popular modeling techniques, relying on data-driven heuristics to arrive at its solutions. Recent comparisons between these algorithms and traditional statistical modeling techniques have largely ignored the superiority gained by the former approaches due to involvement of model-building search algorithms. This has led to alignment of statistical and machine learning approaches with different types of problems and the under-development of procedures that combine their attributes. In this context, we hoped to understand the domains of applicability for each approach and to identify areas where a marriage between the two approaches is warranted. We then sought to develop a hybrid statistical-machine learning procedure with the best attributes of each. METHODS: We present three simple examples to illustrate when to use each modeling approach and posit a general framework for combining them into an enhanced logistic regression model building procedure that aids interpretation. We study 556 benchmark machine learning datasets to uncover when machine learning techniques outperformed rudimentary logistic regression models and so are potentially well-equipped to enhance them. We illustrate a software package, InteractionTransformer, which embeds logistic regression with advanced model building capacity by using machine learning algorithms to extract candidate interaction features from a random forest model for inclusion in the model. Finally, we apply our enhanced logistic regression analysis to two real-word biomedical examples, one where predictors vary linearly with the outcome and another with extensive second-order interactions. RESULTS: Preliminary statistical analysis demonstrated that across 556 benchmark datasets, the random forest approach significantly outperformed the logistic regression approach. We found a statistically significant increase in predictive performance when using hybrid procedures and greater clarity in the association with the outcome of terms acquired compared to directly interpreting the random forest output. CONCLUSIONS: When a random forest model is closer to the true model, hybrid statistical-machine learning procedures can substantially enhance the performance of statistical procedures in an automated manner while preserving easy interpretation of the results. Such hybrid methods may help facilitate widespread adoption of machine learning techniques in the biomedical setting. |
format | Online Article Text |
id | pubmed-7325087 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-73250872020-06-30 Don’t dismiss logistic regression: the case for sensible extraction of interactions in the era of machine learning Levy, Joshua J. O’Malley, A. James BMC Med Res Methodol Technical Advance BACKGROUND: Machine learning approaches have become increasingly popular modeling techniques, relying on data-driven heuristics to arrive at its solutions. Recent comparisons between these algorithms and traditional statistical modeling techniques have largely ignored the superiority gained by the former approaches due to involvement of model-building search algorithms. This has led to alignment of statistical and machine learning approaches with different types of problems and the under-development of procedures that combine their attributes. In this context, we hoped to understand the domains of applicability for each approach and to identify areas where a marriage between the two approaches is warranted. We then sought to develop a hybrid statistical-machine learning procedure with the best attributes of each. METHODS: We present three simple examples to illustrate when to use each modeling approach and posit a general framework for combining them into an enhanced logistic regression model building procedure that aids interpretation. We study 556 benchmark machine learning datasets to uncover when machine learning techniques outperformed rudimentary logistic regression models and so are potentially well-equipped to enhance them. We illustrate a software package, InteractionTransformer, which embeds logistic regression with advanced model building capacity by using machine learning algorithms to extract candidate interaction features from a random forest model for inclusion in the model. Finally, we apply our enhanced logistic regression analysis to two real-word biomedical examples, one where predictors vary linearly with the outcome and another with extensive second-order interactions. RESULTS: Preliminary statistical analysis demonstrated that across 556 benchmark datasets, the random forest approach significantly outperformed the logistic regression approach. We found a statistically significant increase in predictive performance when using hybrid procedures and greater clarity in the association with the outcome of terms acquired compared to directly interpreting the random forest output. CONCLUSIONS: When a random forest model is closer to the true model, hybrid statistical-machine learning procedures can substantially enhance the performance of statistical procedures in an automated manner while preserving easy interpretation of the results. Such hybrid methods may help facilitate widespread adoption of machine learning techniques in the biomedical setting. BioMed Central 2020-06-29 /pmc/articles/PMC7325087/ /pubmed/32600277 http://dx.doi.org/10.1186/s12874-020-01046-3 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Technical Advance Levy, Joshua J. O’Malley, A. James Don’t dismiss logistic regression: the case for sensible extraction of interactions in the era of machine learning |
title | Don’t dismiss logistic regression: the case for sensible extraction of interactions in the era of machine learning |
title_full | Don’t dismiss logistic regression: the case for sensible extraction of interactions in the era of machine learning |
title_fullStr | Don’t dismiss logistic regression: the case for sensible extraction of interactions in the era of machine learning |
title_full_unstemmed | Don’t dismiss logistic regression: the case for sensible extraction of interactions in the era of machine learning |
title_short | Don’t dismiss logistic regression: the case for sensible extraction of interactions in the era of machine learning |
title_sort | don’t dismiss logistic regression: the case for sensible extraction of interactions in the era of machine learning |
topic | Technical Advance |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7325087/ https://www.ncbi.nlm.nih.gov/pubmed/32600277 http://dx.doi.org/10.1186/s12874-020-01046-3 |
work_keys_str_mv | AT levyjoshuaj dontdismisslogisticregressionthecaseforsensibleextractionofinteractionsintheeraofmachinelearning AT omalleyajames dontdismisslogisticregressionthecaseforsensibleextractionofinteractionsintheeraofmachinelearning |