Cargando…

Subpopulation-specific machine learning prognosis for underrepresented patients with double prioritized bias correction

BACKGROUND: Many clinical datasets are intrinsically imbalanced, dominated by overwhelming majority groups. Off-the-shelf machine learning models that optimize the prognosis of majority patient types (e.g., healthy class) may cause substantial errors on the minority prediction class (e.g., disease c...

Descripción completa

Detalles Bibliográficos
Autores principales:	Afrose, Sharmin, Song, Wenjia, Nemeroff, Charles B., Lu, Chang, Yao, Danfeng (Daphne)
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9436942/ https://www.ncbi.nlm.nih.gov/pubmed/36059892 http://dx.doi.org/10.1038/s43856-022-00165-w

_version_	1784781486138851328
author	Afrose, Sharmin Song, Wenjia Nemeroff, Charles B. Lu, Chang Yao, Danfeng (Daphne)
author_facet	Afrose, Sharmin Song, Wenjia Nemeroff, Charles B. Lu, Chang Yao, Danfeng (Daphne)
author_sort	Afrose, Sharmin
collection	PubMed
description	BACKGROUND: Many clinical datasets are intrinsically imbalanced, dominated by overwhelming majority groups. Off-the-shelf machine learning models that optimize the prognosis of majority patient types (e.g., healthy class) may cause substantial errors on the minority prediction class (e.g., disease class) and demographic subgroups (e.g., Black or young patients). In the typical one-machine-learning-model-fits-all paradigm, racial and age disparities are likely to exist, but unreported. In addition, some widely used whole-population metrics give misleading results. METHODS: We design a double prioritized (DP) bias correction technique to mitigate representational biases in machine learning-based prognosis. Our method trains customized machine learning models for specific ethnicity or age groups, a substantial departure from the one-model-predicts-all convention. We compare with other sampling and reweighting techniques in mortality and cancer survivability prediction tasks. RESULTS: We first provide empirical evidence showing various prediction deficiencies in a typical machine learning setting without bias correction. For example, missed death cases are 3.14 times higher than missed survival cases for mortality prediction. Then, we show DP consistently boosts the minority class recall for underrepresented groups, by up to 38.0%. DP also reduces relative disparities across race and age groups, e.g., up to 88.0% better than the 8 existing sampling solutions in terms of the relative disparity of minority class recall. Cross-race and cross-age-group evaluation also suggests the need for subpopulation-specific machine learning models. CONCLUSIONS: Biases exist in the widely accepted one-machine-learning-model-fits-all-population approach. We invent a bias correction method that produces specialized machine learning prognostication models for underrepresented racial and age groups. This technique may reduce potentially life-threatening prediction mistakes for minority populations.
format	Online Article Text
id	pubmed-9436942
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-94369422022-09-03 Subpopulation-specific machine learning prognosis for underrepresented patients with double prioritized bias correction Afrose, Sharmin Song, Wenjia Nemeroff, Charles B. Lu, Chang Yao, Danfeng (Daphne) Commun Med (Lond) Article BACKGROUND: Many clinical datasets are intrinsically imbalanced, dominated by overwhelming majority groups. Off-the-shelf machine learning models that optimize the prognosis of majority patient types (e.g., healthy class) may cause substantial errors on the minority prediction class (e.g., disease class) and demographic subgroups (e.g., Black or young patients). In the typical one-machine-learning-model-fits-all paradigm, racial and age disparities are likely to exist, but unreported. In addition, some widely used whole-population metrics give misleading results. METHODS: We design a double prioritized (DP) bias correction technique to mitigate representational biases in machine learning-based prognosis. Our method trains customized machine learning models for specific ethnicity or age groups, a substantial departure from the one-model-predicts-all convention. We compare with other sampling and reweighting techniques in mortality and cancer survivability prediction tasks. RESULTS: We first provide empirical evidence showing various prediction deficiencies in a typical machine learning setting without bias correction. For example, missed death cases are 3.14 times higher than missed survival cases for mortality prediction. Then, we show DP consistently boosts the minority class recall for underrepresented groups, by up to 38.0%. DP also reduces relative disparities across race and age groups, e.g., up to 88.0% better than the 8 existing sampling solutions in terms of the relative disparity of minority class recall. Cross-race and cross-age-group evaluation also suggests the need for subpopulation-specific machine learning models. CONCLUSIONS: Biases exist in the widely accepted one-machine-learning-model-fits-all-population approach. We invent a bias correction method that produces specialized machine learning prognostication models for underrepresented racial and age groups. This technique may reduce potentially life-threatening prediction mistakes for minority populations. Nature Publishing Group UK 2022-09-01 /pmc/articles/PMC9436942/ /pubmed/36059892 http://dx.doi.org/10.1038/s43856-022-00165-w Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Afrose, Sharmin Song, Wenjia Nemeroff, Charles B. Lu, Chang Yao, Danfeng (Daphne) Subpopulation-specific machine learning prognosis for underrepresented patients with double prioritized bias correction
title	Subpopulation-specific machine learning prognosis for underrepresented patients with double prioritized bias correction
title_full	Subpopulation-specific machine learning prognosis for underrepresented patients with double prioritized bias correction
title_fullStr	Subpopulation-specific machine learning prognosis for underrepresented patients with double prioritized bias correction
title_full_unstemmed	Subpopulation-specific machine learning prognosis for underrepresented patients with double prioritized bias correction
title_short	Subpopulation-specific machine learning prognosis for underrepresented patients with double prioritized bias correction
title_sort	subpopulation-specific machine learning prognosis for underrepresented patients with double prioritized bias correction
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9436942/ https://www.ncbi.nlm.nih.gov/pubmed/36059892 http://dx.doi.org/10.1038/s43856-022-00165-w
work_keys_str_mv	AT afrosesharmin subpopulationspecificmachinelearningprognosisforunderrepresentedpatientswithdoubleprioritizedbiascorrection AT songwenjia subpopulationspecificmachinelearningprognosisforunderrepresentedpatientswithdoubleprioritizedbiascorrection AT nemeroffcharlesb subpopulationspecificmachinelearningprognosisforunderrepresentedpatientswithdoubleprioritizedbiascorrection AT luchang subpopulationspecificmachinelearningprognosisforunderrepresentedpatientswithdoubleprioritizedbiascorrection AT yaodanfengdaphne subpopulationspecificmachinelearningprognosisforunderrepresentedpatientswithdoubleprioritizedbiascorrection

Subpopulation-specific machine learning prognosis for underrepresented patients with double prioritized bias correction

Ejemplares similares