Cargando…

Interpretability and fairness evaluation of deep learning models on MIMIC-IV dataset

The recent release of large-scale healthcare datasets has greatly propelled the research of data-driven deep learning models for healthcare applications. However, due to the nature of such deep black-boxed models, concerns about interpretability, fairness, and biases in healthcare scenarios where hu...

Descripción completa

Detalles Bibliográficos
Autores principales:	Meng, Chuizheng, Trinh, Loc, Xu, Nan, Enouen, James, Liu, Yan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9065125/ https://www.ncbi.nlm.nih.gov/pubmed/35504931 http://dx.doi.org/10.1038/s41598-022-11012-2

_version_	1784699516587343872
author	Meng, Chuizheng Trinh, Loc Xu, Nan Enouen, James Liu, Yan
author_facet	Meng, Chuizheng Trinh, Loc Xu, Nan Enouen, James Liu, Yan
author_sort	Meng, Chuizheng
collection	PubMed
description	The recent release of large-scale healthcare datasets has greatly propelled the research of data-driven deep learning models for healthcare applications. However, due to the nature of such deep black-boxed models, concerns about interpretability, fairness, and biases in healthcare scenarios where human lives are at stake call for a careful and thorough examination of both datasets and models. In this work, we focus on MIMIC-IV (Medical Information Mart for Intensive Care, version IV), the largest publicly available healthcare dataset, and conduct comprehensive analyses of interpretability as well as dataset representation bias and prediction fairness of deep learning models for in-hospital mortality prediction. First, we analyze the interpretability of deep learning mortality prediction models and observe that (1) the best-performing interpretability method successfully identifies critical features for mortality prediction on various prediction models as well as recognizing new important features that domain knowledge does not consider; (2) prediction models rely on demographic features, raising concerns in fairness. Therefore, we then evaluate the fairness of models and do observe the unfairness: (1) there exists disparate treatment in prescribing mechanical ventilation among patient groups across ethnicity, gender and age; (2) models often rely on racial attributes unequally across subgroups to generate their predictions. We further draw concrete connections between interpretability methods and fairness metrics by showing how feature importance from interpretability methods can be beneficial in quantifying potential disparities in mortality predictors. Our analysis demonstrates that the prediction performance is not the only factor to consider when evaluating models for healthcare applications, since high prediction performance might be the result of unfair utilization of demographic features. Our findings suggest that future research in AI models for healthcare applications can benefit from utilizing the analysis workflow of interpretability and fairness as well as verifying if models achieve superior performance at the cost of introducing bias.
format	Online Article Text
id	pubmed-9065125
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-90651252022-05-04 Interpretability and fairness evaluation of deep learning models on MIMIC-IV dataset Meng, Chuizheng Trinh, Loc Xu, Nan Enouen, James Liu, Yan Sci Rep Article The recent release of large-scale healthcare datasets has greatly propelled the research of data-driven deep learning models for healthcare applications. However, due to the nature of such deep black-boxed models, concerns about interpretability, fairness, and biases in healthcare scenarios where human lives are at stake call for a careful and thorough examination of both datasets and models. In this work, we focus on MIMIC-IV (Medical Information Mart for Intensive Care, version IV), the largest publicly available healthcare dataset, and conduct comprehensive analyses of interpretability as well as dataset representation bias and prediction fairness of deep learning models for in-hospital mortality prediction. First, we analyze the interpretability of deep learning mortality prediction models and observe that (1) the best-performing interpretability method successfully identifies critical features for mortality prediction on various prediction models as well as recognizing new important features that domain knowledge does not consider; (2) prediction models rely on demographic features, raising concerns in fairness. Therefore, we then evaluate the fairness of models and do observe the unfairness: (1) there exists disparate treatment in prescribing mechanical ventilation among patient groups across ethnicity, gender and age; (2) models often rely on racial attributes unequally across subgroups to generate their predictions. We further draw concrete connections between interpretability methods and fairness metrics by showing how feature importance from interpretability methods can be beneficial in quantifying potential disparities in mortality predictors. Our analysis demonstrates that the prediction performance is not the only factor to consider when evaluating models for healthcare applications, since high prediction performance might be the result of unfair utilization of demographic features. Our findings suggest that future research in AI models for healthcare applications can benefit from utilizing the analysis workflow of interpretability and fairness as well as verifying if models achieve superior performance at the cost of introducing bias. Nature Publishing Group UK 2022-05-03 /pmc/articles/PMC9065125/ /pubmed/35504931 http://dx.doi.org/10.1038/s41598-022-11012-2 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Meng, Chuizheng Trinh, Loc Xu, Nan Enouen, James Liu, Yan Interpretability and fairness evaluation of deep learning models on MIMIC-IV dataset
title	Interpretability and fairness evaluation of deep learning models on MIMIC-IV dataset
title_full	Interpretability and fairness evaluation of deep learning models on MIMIC-IV dataset
title_fullStr	Interpretability and fairness evaluation of deep learning models on MIMIC-IV dataset
title_full_unstemmed	Interpretability and fairness evaluation of deep learning models on MIMIC-IV dataset
title_short	Interpretability and fairness evaluation of deep learning models on MIMIC-IV dataset
title_sort	interpretability and fairness evaluation of deep learning models on mimic-iv dataset
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9065125/ https://www.ncbi.nlm.nih.gov/pubmed/35504931 http://dx.doi.org/10.1038/s41598-022-11012-2
work_keys_str_mv	AT mengchuizheng interpretabilityandfairnessevaluationofdeeplearningmodelsonmimicivdataset AT trinhloc interpretabilityandfairnessevaluationofdeeplearningmodelsonmimicivdataset AT xunan interpretabilityandfairnessevaluationofdeeplearningmodelsonmimicivdataset AT enouenjames interpretabilityandfairnessevaluationofdeeplearningmodelsonmimicivdataset AT liuyan interpretabilityandfairnessevaluationofdeeplearningmodelsonmimicivdataset

Interpretability and fairness evaluation of deep learning models on MIMIC-IV dataset

Ejemplares similares