Cargando…

Bias and fairness assessment of a natural language processing opioid misuse classifier: detection and mitigation of electronic health record data disadvantages across racial subgroups

OBJECTIVES: To assess fairness and bias of a previously validated machine learning opioid misuse classifier. MATERIALS & METHODS: Two experiments were conducted with the classifier’s original (n = 1000) and external validation (n = 53 974) datasets from 2 health systems. Bias was assessed via te...

Descripción completa

Detalles Bibliográficos
Autores principales: Thompson, Hale M, Sharma, Brihat, Bhalla, Sameer, Boley, Randy, McCluskey, Connor, Dligach, Dmitriy, Churpek, Matthew M, Karnik, Niranjan S, Afshar, Majid
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510285/
https://www.ncbi.nlm.nih.gov/pubmed/34383925
http://dx.doi.org/10.1093/jamia/ocab148
_version_ 1784582539564810240
author Thompson, Hale M
Sharma, Brihat
Bhalla, Sameer
Boley, Randy
McCluskey, Connor
Dligach, Dmitriy
Churpek, Matthew M
Karnik, Niranjan S
Afshar, Majid
author_facet Thompson, Hale M
Sharma, Brihat
Bhalla, Sameer
Boley, Randy
McCluskey, Connor
Dligach, Dmitriy
Churpek, Matthew M
Karnik, Niranjan S
Afshar, Majid
author_sort Thompson, Hale M
collection PubMed
description OBJECTIVES: To assess fairness and bias of a previously validated machine learning opioid misuse classifier. MATERIALS & METHODS: Two experiments were conducted with the classifier’s original (n = 1000) and external validation (n = 53 974) datasets from 2 health systems. Bias was assessed via testing for differences in type II error rates across racial/ethnic subgroups (Black, Hispanic/Latinx, White, Other) using bootstrapped 95% confidence intervals. A local surrogate model was estimated to interpret the classifier’s predictions by race and averaged globally from the datasets. Subgroup analyses and post-hoc recalibrations were conducted to attempt to mitigate biased metrics. RESULTS: We identified bias in the false negative rate (FNR = 0.32) of the Black subgroup compared to the FNR (0.17) of the White subgroup. Top features included “heroin” and “substance abuse” across subgroups. Post-hoc recalibrations eliminated bias in FNR with minimal changes in other subgroup error metrics. The Black FNR subgroup had higher risk scores for readmission and mortality than the White FNR subgroup, and a higher mortality risk score than the Black true positive subgroup (P < .05). DISCUSSION: The Black FNR subgroup had the greatest severity of disease and risk for poor outcomes. Similar features were present between subgroups for predicting opioid misuse, but inequities were present. Post-hoc mitigation techniques mitigated bias in type II error rate without creating substantial type I error rates. From model design through deployment, bias and data disadvantages should be systematically addressed. CONCLUSION: Standardized, transparent bias assessments are needed to improve trustworthiness in clinical machine learning models.
format Online
Article
Text
id pubmed-8510285
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-85102852021-10-13 Bias and fairness assessment of a natural language processing opioid misuse classifier: detection and mitigation of electronic health record data disadvantages across racial subgroups Thompson, Hale M Sharma, Brihat Bhalla, Sameer Boley, Randy McCluskey, Connor Dligach, Dmitriy Churpek, Matthew M Karnik, Niranjan S Afshar, Majid J Am Med Inform Assoc Research and Applications OBJECTIVES: To assess fairness and bias of a previously validated machine learning opioid misuse classifier. MATERIALS & METHODS: Two experiments were conducted with the classifier’s original (n = 1000) and external validation (n = 53 974) datasets from 2 health systems. Bias was assessed via testing for differences in type II error rates across racial/ethnic subgroups (Black, Hispanic/Latinx, White, Other) using bootstrapped 95% confidence intervals. A local surrogate model was estimated to interpret the classifier’s predictions by race and averaged globally from the datasets. Subgroup analyses and post-hoc recalibrations were conducted to attempt to mitigate biased metrics. RESULTS: We identified bias in the false negative rate (FNR = 0.32) of the Black subgroup compared to the FNR (0.17) of the White subgroup. Top features included “heroin” and “substance abuse” across subgroups. Post-hoc recalibrations eliminated bias in FNR with minimal changes in other subgroup error metrics. The Black FNR subgroup had higher risk scores for readmission and mortality than the White FNR subgroup, and a higher mortality risk score than the Black true positive subgroup (P < .05). DISCUSSION: The Black FNR subgroup had the greatest severity of disease and risk for poor outcomes. Similar features were present between subgroups for predicting opioid misuse, but inequities were present. Post-hoc mitigation techniques mitigated bias in type II error rate without creating substantial type I error rates. From model design through deployment, bias and data disadvantages should be systematically addressed. CONCLUSION: Standardized, transparent bias assessments are needed to improve trustworthiness in clinical machine learning models. Oxford University Press 2021-08-12 /pmc/articles/PMC8510285/ /pubmed/34383925 http://dx.doi.org/10.1093/jamia/ocab148 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Research and Applications
Thompson, Hale M
Sharma, Brihat
Bhalla, Sameer
Boley, Randy
McCluskey, Connor
Dligach, Dmitriy
Churpek, Matthew M
Karnik, Niranjan S
Afshar, Majid
Bias and fairness assessment of a natural language processing opioid misuse classifier: detection and mitigation of electronic health record data disadvantages across racial subgroups
title Bias and fairness assessment of a natural language processing opioid misuse classifier: detection and mitigation of electronic health record data disadvantages across racial subgroups
title_full Bias and fairness assessment of a natural language processing opioid misuse classifier: detection and mitigation of electronic health record data disadvantages across racial subgroups
title_fullStr Bias and fairness assessment of a natural language processing opioid misuse classifier: detection and mitigation of electronic health record data disadvantages across racial subgroups
title_full_unstemmed Bias and fairness assessment of a natural language processing opioid misuse classifier: detection and mitigation of electronic health record data disadvantages across racial subgroups
title_short Bias and fairness assessment of a natural language processing opioid misuse classifier: detection and mitigation of electronic health record data disadvantages across racial subgroups
title_sort bias and fairness assessment of a natural language processing opioid misuse classifier: detection and mitigation of electronic health record data disadvantages across racial subgroups
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510285/
https://www.ncbi.nlm.nih.gov/pubmed/34383925
http://dx.doi.org/10.1093/jamia/ocab148
work_keys_str_mv AT thompsonhalem biasandfairnessassessmentofanaturallanguageprocessingopioidmisuseclassifierdetectionandmitigationofelectronichealthrecorddatadisadvantagesacrossracialsubgroups
AT sharmabrihat biasandfairnessassessmentofanaturallanguageprocessingopioidmisuseclassifierdetectionandmitigationofelectronichealthrecorddatadisadvantagesacrossracialsubgroups
AT bhallasameer biasandfairnessassessmentofanaturallanguageprocessingopioidmisuseclassifierdetectionandmitigationofelectronichealthrecorddatadisadvantagesacrossracialsubgroups
AT boleyrandy biasandfairnessassessmentofanaturallanguageprocessingopioidmisuseclassifierdetectionandmitigationofelectronichealthrecorddatadisadvantagesacrossracialsubgroups
AT mccluskeyconnor biasandfairnessassessmentofanaturallanguageprocessingopioidmisuseclassifierdetectionandmitigationofelectronichealthrecorddatadisadvantagesacrossracialsubgroups
AT dligachdmitriy biasandfairnessassessmentofanaturallanguageprocessingopioidmisuseclassifierdetectionandmitigationofelectronichealthrecorddatadisadvantagesacrossracialsubgroups
AT churpekmatthewm biasandfairnessassessmentofanaturallanguageprocessingopioidmisuseclassifierdetectionandmitigationofelectronichealthrecorddatadisadvantagesacrossracialsubgroups
AT karnikniranjans biasandfairnessassessmentofanaturallanguageprocessingopioidmisuseclassifierdetectionandmitigationofelectronichealthrecorddatadisadvantagesacrossracialsubgroups
AT afsharmajid biasandfairnessassessmentofanaturallanguageprocessingopioidmisuseclassifierdetectionandmitigationofelectronichealthrecorddatadisadvantagesacrossracialsubgroups