Cargando…

Bias or biology? Importance of model interpretation in machine learning studies from electronic health records

OBJECTIVE: The rate of diabetic complication progression varies across individuals and understanding factors that alter the rate of complication progression may uncover new clinical interventions for personalized diabetes management. MATERIALS AND METHODS: We explore how various machine learning (ML...

Descripción completa

Detalles Bibliográficos
Autores principales:	Momenzadeh, Amanda, Shamsa, Ali, Meyer, Jesse G
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2022
Materias:	Research and Applications
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9360778/ https://www.ncbi.nlm.nih.gov/pubmed/35958671 http://dx.doi.org/10.1093/jamiaopen/ooac063

_version_	1784764399120023552
author	Momenzadeh, Amanda Shamsa, Ali Meyer, Jesse G
author_facet	Momenzadeh, Amanda Shamsa, Ali Meyer, Jesse G
author_sort	Momenzadeh, Amanda
collection	PubMed
description	OBJECTIVE: The rate of diabetic complication progression varies across individuals and understanding factors that alter the rate of complication progression may uncover new clinical interventions for personalized diabetes management. MATERIALS AND METHODS: We explore how various machine learning (ML) models and types of electronic health records (EHRs) can predict fast versus slow onset of neuropathy, nephropathy, ocular disease, or cardiovascular disease using only patient data collected prior to diabetes diagnosis. RESULTS: We find that optimized random forest models performed best to accurately predict the diagnosis of a diabetic complication, with the most effective model distinguishing between fast versus slow nephropathy (AUROC = 0.75). Using all data sets combined allowed for the highest model predictive performance, and social history or laboratory alone were most predictive. SHapley Additive exPlanations (SHAP) model interpretation allowed for exploration of predictors of fast and slow complication diagnosis, including underlying biases present in the EHR. Patients in the fast group had more medical visits, incurring a potential informed decision bias. DISCUSSION: Our study is unique in the realm of ML studies as it leverages SHAP as a starting point to explore patient markers not routinely used in diabetes monitoring. A mix of both bias and biological processes is likely present in influencing a model’s ability to distinguish between groups. CONCLUSION: Overall, model interpretation is a critical step in evaluating validity of a user-intended endpoint for a model when using EHR data, and predictors affected by bias and those driven by biologic processes should be equally recognized.
format	Online Article Text
id	pubmed-9360778
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-93607782022-08-10 Bias or biology? Importance of model interpretation in machine learning studies from electronic health records Momenzadeh, Amanda Shamsa, Ali Meyer, Jesse G JAMIA Open Research and Applications OBJECTIVE: The rate of diabetic complication progression varies across individuals and understanding factors that alter the rate of complication progression may uncover new clinical interventions for personalized diabetes management. MATERIALS AND METHODS: We explore how various machine learning (ML) models and types of electronic health records (EHRs) can predict fast versus slow onset of neuropathy, nephropathy, ocular disease, or cardiovascular disease using only patient data collected prior to diabetes diagnosis. RESULTS: We find that optimized random forest models performed best to accurately predict the diagnosis of a diabetic complication, with the most effective model distinguishing between fast versus slow nephropathy (AUROC = 0.75). Using all data sets combined allowed for the highest model predictive performance, and social history or laboratory alone were most predictive. SHapley Additive exPlanations (SHAP) model interpretation allowed for exploration of predictors of fast and slow complication diagnosis, including underlying biases present in the EHR. Patients in the fast group had more medical visits, incurring a potential informed decision bias. DISCUSSION: Our study is unique in the realm of ML studies as it leverages SHAP as a starting point to explore patient markers not routinely used in diabetes monitoring. A mix of both bias and biological processes is likely present in influencing a model’s ability to distinguish between groups. CONCLUSION: Overall, model interpretation is a critical step in evaluating validity of a user-intended endpoint for a model when using EHR data, and predictors affected by bias and those driven by biologic processes should be equally recognized. Oxford University Press 2022-08-08 /pmc/articles/PMC9360778/ /pubmed/35958671 http://dx.doi.org/10.1093/jamiaopen/ooac063 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Research and Applications Momenzadeh, Amanda Shamsa, Ali Meyer, Jesse G Bias or biology? Importance of model interpretation in machine learning studies from electronic health records
title	Bias or biology? Importance of model interpretation in machine learning studies from electronic health records
title_full	Bias or biology? Importance of model interpretation in machine learning studies from electronic health records
title_fullStr	Bias or biology? Importance of model interpretation in machine learning studies from electronic health records
title_full_unstemmed	Bias or biology? Importance of model interpretation in machine learning studies from electronic health records
title_short	Bias or biology? Importance of model interpretation in machine learning studies from electronic health records
title_sort	bias or biology? importance of model interpretation in machine learning studies from electronic health records
topic	Research and Applications
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9360778/ https://www.ncbi.nlm.nih.gov/pubmed/35958671 http://dx.doi.org/10.1093/jamiaopen/ooac063
work_keys_str_mv	AT momenzadehamanda biasorbiologyimportanceofmodelinterpretationinmachinelearningstudiesfromelectronichealthrecords AT shamsaali biasorbiologyimportanceofmodelinterpretationinmachinelearningstudiesfromelectronichealthrecords AT meyerjesseg biasorbiologyimportanceofmodelinterpretationinmachinelearningstudiesfromelectronichealthrecords

Bias or biology? Importance of model interpretation in machine learning studies from electronic health records

Ejemplares similares