Cargando…
Statistical quantification of confounding bias in machine learning models
BACKGROUND: The lack of nonparametric statistical tests for confounding bias significantly hampers the development of robust, valid, and generalizable predictive models in many fields of research. Here I propose the partial confounder test, which, for a given confounder variable, probes the null hyp...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9412867/ https://www.ncbi.nlm.nih.gov/pubmed/36017878 http://dx.doi.org/10.1093/gigascience/giac082 |
_version_ | 1784775598087864320 |
---|---|
author | Spisak, Tamas |
author_facet | Spisak, Tamas |
author_sort | Spisak, Tamas |
collection | PubMed |
description | BACKGROUND: The lack of nonparametric statistical tests for confounding bias significantly hampers the development of robust, valid, and generalizable predictive models in many fields of research. Here I propose the partial confounder test, which, for a given confounder variable, probes the null hypotheses of the model being unconfounded. RESULTS: The test provides a strict control for type I errors and high statistical power, even for nonnormally and nonlinearly dependent predictions, often seen in machine learning. Applying the proposed test on models trained on large-scale functional brain connectivity data (N= 1,865) (i) reveals previously unreported confounders and (ii) shows that state-of-the-art confound mitigation approaches may fail preventing confounder bias in several cases. CONCLUSIONS: The proposed test (implemented in the package mlconfound; https://mlconfound.readthedocs.io) can aid the assessment and improvement of the generalizability and validity of predictive models and, thereby, fosters the development of clinically useful machine learning biomarkers. |
format | Online Article Text |
id | pubmed-9412867 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-94128672022-08-29 Statistical quantification of confounding bias in machine learning models Spisak, Tamas Gigascience Research BACKGROUND: The lack of nonparametric statistical tests for confounding bias significantly hampers the development of robust, valid, and generalizable predictive models in many fields of research. Here I propose the partial confounder test, which, for a given confounder variable, probes the null hypotheses of the model being unconfounded. RESULTS: The test provides a strict control for type I errors and high statistical power, even for nonnormally and nonlinearly dependent predictions, often seen in machine learning. Applying the proposed test on models trained on large-scale functional brain connectivity data (N= 1,865) (i) reveals previously unreported confounders and (ii) shows that state-of-the-art confound mitigation approaches may fail preventing confounder bias in several cases. CONCLUSIONS: The proposed test (implemented in the package mlconfound; https://mlconfound.readthedocs.io) can aid the assessment and improvement of the generalizability and validity of predictive models and, thereby, fosters the development of clinically useful machine learning biomarkers. Oxford University Press 2022-08-26 /pmc/articles/PMC9412867/ /pubmed/36017878 http://dx.doi.org/10.1093/gigascience/giac082 Text en © The Author(s) 2022. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Spisak, Tamas Statistical quantification of confounding bias in machine learning models |
title | Statistical quantification of confounding bias in machine learning models |
title_full | Statistical quantification of confounding bias in machine learning models |
title_fullStr | Statistical quantification of confounding bias in machine learning models |
title_full_unstemmed | Statistical quantification of confounding bias in machine learning models |
title_short | Statistical quantification of confounding bias in machine learning models |
title_sort | statistical quantification of confounding bias in machine learning models |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9412867/ https://www.ncbi.nlm.nih.gov/pubmed/36017878 http://dx.doi.org/10.1093/gigascience/giac082 |
work_keys_str_mv | AT spisaktamas statisticalquantificationofconfoundingbiasinmachinelearningmodels |