Cargando…

Investigating for bias in healthcare algorithms: a sex-stratified analysis of supervised machine learning models in liver disease prediction

OBJECTIVES: The Indian Liver Patient Dataset (ILPD) is used extensively to create algorithms that predict liver disease. Given the existing research describing demographic inequities in liver disease diagnosis and management, these algorithms require scrutiny for potential biases. We address this ov...

Descripción completa

Detalles Bibliográficos
Autores principales: Straw, Isabel, Wu, Honghan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BMJ Publishing Group 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9039354/
https://www.ncbi.nlm.nih.gov/pubmed/35470133
http://dx.doi.org/10.1136/bmjhci-2021-100457
_version_ 1784694109112369152
author Straw, Isabel
Wu, Honghan
author_facet Straw, Isabel
Wu, Honghan
author_sort Straw, Isabel
collection PubMed
description OBJECTIVES: The Indian Liver Patient Dataset (ILPD) is used extensively to create algorithms that predict liver disease. Given the existing research describing demographic inequities in liver disease diagnosis and management, these algorithms require scrutiny for potential biases. We address this overlooked issue by investigating ILPD models for sex bias. METHODS: Following our literature review of ILPD papers, the models reported in existing studies are recreated and then interrogated for bias. We define four experiments, training on sex-unbalanced/balanced data, with and without feature selection. We build random forests (RFs), support vector machines (SVMs), Gaussian Naïve Bayes and logistic regression (LR) classifiers, running experiments 100 times, reporting average results with SD. RESULTS: We reproduce published models achieving accuracies of >70% (LR 71.31% (2.37 SD) – SVM 79.40% (2.50 SD)) and demonstrate a previously unobserved performance disparity. Across all classifiers females suffer from a higher false negative rate (FNR). Presently, RF and LR classifiers are reported as the most effective models, yet in our experiments they demonstrate the greatest FNR disparity (RF; −21.02%; LR; −24.07%). DISCUSSION: We demonstrate a sex disparity that exists in published ILPD classifiers. In practice, the higher FNR for females would manifest as increased rates of missed diagnosis for female patients and a consequent lack of appropriate care. Our study demonstrates that evaluating biases in the initial stages of machine learning can provide insights into inequalities in current clinical practice, reveal pathophysiological differences between the male and females, and can mitigate the digitisation of inequalities into algorithmic systems. CONCLUSION: Our findings are important to medical data scientists, clinicians and policy-makers involved in the implementation medical artificial intelligence systems. An awareness of the potential biases of these systems is essential in preventing the digital exacerbation of healthcare inequalities.
format Online
Article
Text
id pubmed-9039354
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BMJ Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-90393542022-05-06 Investigating for bias in healthcare algorithms: a sex-stratified analysis of supervised machine learning models in liver disease prediction Straw, Isabel Wu, Honghan BMJ Health Care Inform Original Research OBJECTIVES: The Indian Liver Patient Dataset (ILPD) is used extensively to create algorithms that predict liver disease. Given the existing research describing demographic inequities in liver disease diagnosis and management, these algorithms require scrutiny for potential biases. We address this overlooked issue by investigating ILPD models for sex bias. METHODS: Following our literature review of ILPD papers, the models reported in existing studies are recreated and then interrogated for bias. We define four experiments, training on sex-unbalanced/balanced data, with and without feature selection. We build random forests (RFs), support vector machines (SVMs), Gaussian Naïve Bayes and logistic regression (LR) classifiers, running experiments 100 times, reporting average results with SD. RESULTS: We reproduce published models achieving accuracies of >70% (LR 71.31% (2.37 SD) – SVM 79.40% (2.50 SD)) and demonstrate a previously unobserved performance disparity. Across all classifiers females suffer from a higher false negative rate (FNR). Presently, RF and LR classifiers are reported as the most effective models, yet in our experiments they demonstrate the greatest FNR disparity (RF; −21.02%; LR; −24.07%). DISCUSSION: We demonstrate a sex disparity that exists in published ILPD classifiers. In practice, the higher FNR for females would manifest as increased rates of missed diagnosis for female patients and a consequent lack of appropriate care. Our study demonstrates that evaluating biases in the initial stages of machine learning can provide insights into inequalities in current clinical practice, reveal pathophysiological differences between the male and females, and can mitigate the digitisation of inequalities into algorithmic systems. CONCLUSION: Our findings are important to medical data scientists, clinicians and policy-makers involved in the implementation medical artificial intelligence systems. An awareness of the potential biases of these systems is essential in preventing the digital exacerbation of healthcare inequalities. BMJ Publishing Group 2022-04-24 /pmc/articles/PMC9039354/ /pubmed/35470133 http://dx.doi.org/10.1136/bmjhci-2021-100457 Text en © Author(s) (or their employer(s)) 2022. Re-use permitted under CC BY. Published by BMJ. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See: https://creativecommons.org/licenses/by/4.0/.
spellingShingle Original Research
Straw, Isabel
Wu, Honghan
Investigating for bias in healthcare algorithms: a sex-stratified analysis of supervised machine learning models in liver disease prediction
title Investigating for bias in healthcare algorithms: a sex-stratified analysis of supervised machine learning models in liver disease prediction
title_full Investigating for bias in healthcare algorithms: a sex-stratified analysis of supervised machine learning models in liver disease prediction
title_fullStr Investigating for bias in healthcare algorithms: a sex-stratified analysis of supervised machine learning models in liver disease prediction
title_full_unstemmed Investigating for bias in healthcare algorithms: a sex-stratified analysis of supervised machine learning models in liver disease prediction
title_short Investigating for bias in healthcare algorithms: a sex-stratified analysis of supervised machine learning models in liver disease prediction
title_sort investigating for bias in healthcare algorithms: a sex-stratified analysis of supervised machine learning models in liver disease prediction
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9039354/
https://www.ncbi.nlm.nih.gov/pubmed/35470133
http://dx.doi.org/10.1136/bmjhci-2021-100457
work_keys_str_mv AT strawisabel investigatingforbiasinhealthcarealgorithmsasexstratifiedanalysisofsupervisedmachinelearningmodelsinliverdiseaseprediction
AT wuhonghan investigatingforbiasinhealthcarealgorithmsasexstratifiedanalysisofsupervisedmachinelearningmodelsinliverdiseaseprediction