Cargando…

Resampling to address inequities in predictive modeling of suicide deaths

OBJECTIVE: Improve methodology for equitable suicide death prediction when using sensitive predictors, such as race/ethnicity, for machine learning and statistical methods. METHODS: Train predictive models, logistic regression, naive Bayes, gradient boosting (XGBoost) and random forests, using three...

Descripción completa

Detalles Bibliográficos
Autores principales: Reeves, Majerle, Bhat, Harish S, Goldman-Mellor, Sidra
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BMJ Publishing Group 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8996002/
https://www.ncbi.nlm.nih.gov/pubmed/35396246
http://dx.doi.org/10.1136/bmjhci-2021-100456
_version_ 1784684406980476928
author Reeves, Majerle
Bhat, Harish S
Goldman-Mellor, Sidra
author_facet Reeves, Majerle
Bhat, Harish S
Goldman-Mellor, Sidra
author_sort Reeves, Majerle
collection PubMed
description OBJECTIVE: Improve methodology for equitable suicide death prediction when using sensitive predictors, such as race/ethnicity, for machine learning and statistical methods. METHODS: Train predictive models, logistic regression, naive Bayes, gradient boosting (XGBoost) and random forests, using three resampling techniques (Blind, Separate, Equity) on emergency department (ED) administrative patient records. The Blind method resamples without considering racial/ethnic group. Comparatively, the Separate method trains disjoint models for each group and the Equity method builds a training set that is balanced both by racial/ethnic group and by class. RESULTS: Using the Blind method, performance range of the models’ sensitivity for predicting suicide death between racial/ethnic groups (a measure of prediction inequity) was 0.47 for logistic regression, 0.37 for naive Bayes, 0.56 for XGBoost and 0.58 for random forest. By building separate models for different racial/ethnic groups or using the equity method on the training set, we decreased the range in performance to 0.16, 0.13, 0.19, 0.20 with Separate method, and 0.14, 0.12, 0.24, 0.13 for Equity method, respectively. XGBoost had the highest overall area under the curve (AUC), ranging from 0.69 to 0.79. DISCUSSION: We increased performance equity between different racial/ethnic groups and show that imbalanced training sets lead to models with poor predictive equity. These methods have comparable AUC scores to other work in the field, using only single ED administrative record data. CONCLUSION: We propose two methods to improve equity of suicide death prediction among different racial/ethnic groups. These methods may be applied to other sensitive characteristics to improve equity in machine learning with healthcare applications.
format Online
Article
Text
id pubmed-8996002
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BMJ Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-89960022022-04-27 Resampling to address inequities in predictive modeling of suicide deaths Reeves, Majerle Bhat, Harish S Goldman-Mellor, Sidra BMJ Health Care Inform Original Research OBJECTIVE: Improve methodology for equitable suicide death prediction when using sensitive predictors, such as race/ethnicity, for machine learning and statistical methods. METHODS: Train predictive models, logistic regression, naive Bayes, gradient boosting (XGBoost) and random forests, using three resampling techniques (Blind, Separate, Equity) on emergency department (ED) administrative patient records. The Blind method resamples without considering racial/ethnic group. Comparatively, the Separate method trains disjoint models for each group and the Equity method builds a training set that is balanced both by racial/ethnic group and by class. RESULTS: Using the Blind method, performance range of the models’ sensitivity for predicting suicide death between racial/ethnic groups (a measure of prediction inequity) was 0.47 for logistic regression, 0.37 for naive Bayes, 0.56 for XGBoost and 0.58 for random forest. By building separate models for different racial/ethnic groups or using the equity method on the training set, we decreased the range in performance to 0.16, 0.13, 0.19, 0.20 with Separate method, and 0.14, 0.12, 0.24, 0.13 for Equity method, respectively. XGBoost had the highest overall area under the curve (AUC), ranging from 0.69 to 0.79. DISCUSSION: We increased performance equity between different racial/ethnic groups and show that imbalanced training sets lead to models with poor predictive equity. These methods have comparable AUC scores to other work in the field, using only single ED administrative record data. CONCLUSION: We propose two methods to improve equity of suicide death prediction among different racial/ethnic groups. These methods may be applied to other sensitive characteristics to improve equity in machine learning with healthcare applications. BMJ Publishing Group 2022-04-08 /pmc/articles/PMC8996002/ /pubmed/35396246 http://dx.doi.org/10.1136/bmjhci-2021-100456 Text en © Author(s) (or their employer(s)) 2022. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. https://creativecommons.org/licenses/by-nc/4.0/This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) .
spellingShingle Original Research
Reeves, Majerle
Bhat, Harish S
Goldman-Mellor, Sidra
Resampling to address inequities in predictive modeling of suicide deaths
title Resampling to address inequities in predictive modeling of suicide deaths
title_full Resampling to address inequities in predictive modeling of suicide deaths
title_fullStr Resampling to address inequities in predictive modeling of suicide deaths
title_full_unstemmed Resampling to address inequities in predictive modeling of suicide deaths
title_short Resampling to address inequities in predictive modeling of suicide deaths
title_sort resampling to address inequities in predictive modeling of suicide deaths
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8996002/
https://www.ncbi.nlm.nih.gov/pubmed/35396246
http://dx.doi.org/10.1136/bmjhci-2021-100456
work_keys_str_mv AT reevesmajerle resamplingtoaddressinequitiesinpredictivemodelingofsuicidedeaths
AT bhatharishs resamplingtoaddressinequitiesinpredictivemodelingofsuicidedeaths
AT goldmanmellorsidra resamplingtoaddressinequitiesinpredictivemodelingofsuicidedeaths