Cargando…

Statistical quantification of confounding bias in machine learning models

BACKGROUND: The lack of nonparametric statistical tests for confounding bias significantly hampers the development of robust, valid, and generalizable predictive models in many fields of research. Here I propose the partial confounder test, which, for a given confounder variable, probes the null hyp...

Descripción completa

Detalles Bibliográficos
Autor principal:	Spisak, Tamas
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2022
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9412867/ https://www.ncbi.nlm.nih.gov/pubmed/36017878 http://dx.doi.org/10.1093/gigascience/giac082

_version_	1784775598087864320
author	Spisak, Tamas
author_facet	Spisak, Tamas
author_sort	Spisak, Tamas
collection	PubMed
description	BACKGROUND: The lack of nonparametric statistical tests for confounding bias significantly hampers the development of robust, valid, and generalizable predictive models in many fields of research. Here I propose the partial confounder test, which, for a given confounder variable, probes the null hypotheses of the model being unconfounded. RESULTS: The test provides a strict control for type I errors and high statistical power, even for nonnormally and nonlinearly dependent predictions, often seen in machine learning. Applying the proposed test on models trained on large-scale functional brain connectivity data (N= 1,865) (i) reveals previously unreported confounders and (ii) shows that state-of-the-art confound mitigation approaches may fail preventing confounder bias in several cases. CONCLUSIONS: The proposed test (implemented in the package mlconfound; https://mlconfound.readthedocs.io) can aid the assessment and improvement of the generalizability and validity of predictive models and, thereby, fosters the development of clinically useful machine learning biomarkers.
format	Online Article Text
id	pubmed-9412867
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-94128672022-08-29 Statistical quantification of confounding bias in machine learning models Spisak, Tamas Gigascience Research BACKGROUND: The lack of nonparametric statistical tests for confounding bias significantly hampers the development of robust, valid, and generalizable predictive models in many fields of research. Here I propose the partial confounder test, which, for a given confounder variable, probes the null hypotheses of the model being unconfounded. RESULTS: The test provides a strict control for type I errors and high statistical power, even for nonnormally and nonlinearly dependent predictions, often seen in machine learning. Applying the proposed test on models trained on large-scale functional brain connectivity data (N= 1,865) (i) reveals previously unreported confounders and (ii) shows that state-of-the-art confound mitigation approaches may fail preventing confounder bias in several cases. CONCLUSIONS: The proposed test (implemented in the package mlconfound; https://mlconfound.readthedocs.io) can aid the assessment and improvement of the generalizability and validity of predictive models and, thereby, fosters the development of clinically useful machine learning biomarkers. Oxford University Press 2022-08-26 /pmc/articles/PMC9412867/ /pubmed/36017878 http://dx.doi.org/10.1093/gigascience/giac082 Text en © The Author(s) 2022. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Spisak, Tamas Statistical quantification of confounding bias in machine learning models
title	Statistical quantification of confounding bias in machine learning models
title_full	Statistical quantification of confounding bias in machine learning models
title_fullStr	Statistical quantification of confounding bias in machine learning models
title_full_unstemmed	Statistical quantification of confounding bias in machine learning models
title_short	Statistical quantification of confounding bias in machine learning models
title_sort	statistical quantification of confounding bias in machine learning models
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9412867/ https://www.ncbi.nlm.nih.gov/pubmed/36017878 http://dx.doi.org/10.1093/gigascience/giac082
work_keys_str_mv	AT spisaktamas statisticalquantificationofconfoundingbiasinmachinelearningmodels

Statistical quantification of confounding bias in machine learning models

Ejemplares similares