Cargando…

Standard Vocabularies to Improve Machine Learning Model Transferability With Electronic Health Record Data: Retrospective Cohort Study Using Health Care–Associated Infection

BACKGROUND: With the widespread adoption of electronic healthcare records (EHRs) by US hospitals, there is an opportunity to leverage this data for the development of predictive algorithms to improve clinical care. A key barrier in model development and implementation includes the external validatio...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kiser, Amber C, Eilbeck, Karen, Ferraro, Jeffrey P, Skarda, David E, Samore, Matthew H, Bucher, Brian
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2022
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9472055/ https://www.ncbi.nlm.nih.gov/pubmed/36040784 http://dx.doi.org/10.2196/39057

_version_	1784789223270776832
author	Kiser, Amber C Eilbeck, Karen Ferraro, Jeffrey P Skarda, David E Samore, Matthew H Bucher, Brian
author_facet	Kiser, Amber C Eilbeck, Karen Ferraro, Jeffrey P Skarda, David E Samore, Matthew H Bucher, Brian
author_sort	Kiser, Amber C
collection	PubMed
description	BACKGROUND: With the widespread adoption of electronic healthcare records (EHRs) by US hospitals, there is an opportunity to leverage this data for the development of predictive algorithms to improve clinical care. A key barrier in model development and implementation includes the external validation of model discrimination, which is rare and often results in worse performance. One reason why machine learning models are not externally generalizable is data heterogeneity. A potential solution to address the substantial data heterogeneity between health care systems is to use standard vocabularies to map EHR data elements. The advantage of these vocabularies is a hierarchical relationship between elements, which allows the aggregation of specific clinical features to more general grouped concepts. OBJECTIVE: This study aimed to evaluate grouping EHR data using standard vocabularies to improve the transferability of machine learning models for the detection of postoperative health care–associated infections across institutions with different EHR systems. METHODS: Patients who underwent surgery from the University of Utah Health and Intermountain Healthcare from July 2014 to August 2017 with complete follow-up data were included. The primary outcome was a health care–associated infection within 30 days of the procedure. EHR data from 0-30 days after the operation were mapped to standard vocabularies and grouped using the hierarchical relationships of the vocabularies. Model performance was measured using the area under the receiver operating characteristic curve (AUC) and F(1)-score in internal and external validations. To evaluate model transferability, a difference-in-difference metric was defined as the difference in performance drop between internal and external validations for the baseline and grouped models. RESULTS: A total of 5775 patients from the University of Utah and 15,434 patients from Intermountain Healthcare were included. The prevalence of selected outcomes was from 4.9% (761/15,434) to 5% (291/5775) for surgical site infections, from 0.8% (44/5775) to 1.1% (171/15,434) for pneumonia, from 2.6% (400/15,434) to 3% (175/5775) for sepsis, and from 0.8% (125/15,434) to 0.9% (50/5775) for urinary tract infections. In all outcomes, the grouping of data using standard vocabularies resulted in a reduced drop in AUC and F(1)-score in external validation compared to baseline features (all P<.001, except urinary tract infection AUC: P=.002). The difference-in-difference metrics ranged from 0.005 to 0.248 for AUC and from 0.075 to 0.216 for F(1)-score. CONCLUSIONS: We demonstrated that grouping machine learning model features based on standard vocabularies improved model transferability between data sets across 2 institutions. Improving model transferability using standard vocabularies has the potential to improve the generalization of clinical prediction models across the health care system.
format	Online Article Text
id	pubmed-9472055
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-94720552022-09-15 Standard Vocabularies to Improve Machine Learning Model Transferability With Electronic Health Record Data: Retrospective Cohort Study Using Health Care–Associated Infection Kiser, Amber C Eilbeck, Karen Ferraro, Jeffrey P Skarda, David E Samore, Matthew H Bucher, Brian JMIR Med Inform Original Paper BACKGROUND: With the widespread adoption of electronic healthcare records (EHRs) by US hospitals, there is an opportunity to leverage this data for the development of predictive algorithms to improve clinical care. A key barrier in model development and implementation includes the external validation of model discrimination, which is rare and often results in worse performance. One reason why machine learning models are not externally generalizable is data heterogeneity. A potential solution to address the substantial data heterogeneity between health care systems is to use standard vocabularies to map EHR data elements. The advantage of these vocabularies is a hierarchical relationship between elements, which allows the aggregation of specific clinical features to more general grouped concepts. OBJECTIVE: This study aimed to evaluate grouping EHR data using standard vocabularies to improve the transferability of machine learning models for the detection of postoperative health care–associated infections across institutions with different EHR systems. METHODS: Patients who underwent surgery from the University of Utah Health and Intermountain Healthcare from July 2014 to August 2017 with complete follow-up data were included. The primary outcome was a health care–associated infection within 30 days of the procedure. EHR data from 0-30 days after the operation were mapped to standard vocabularies and grouped using the hierarchical relationships of the vocabularies. Model performance was measured using the area under the receiver operating characteristic curve (AUC) and F(1)-score in internal and external validations. To evaluate model transferability, a difference-in-difference metric was defined as the difference in performance drop between internal and external validations for the baseline and grouped models. RESULTS: A total of 5775 patients from the University of Utah and 15,434 patients from Intermountain Healthcare were included. The prevalence of selected outcomes was from 4.9% (761/15,434) to 5% (291/5775) for surgical site infections, from 0.8% (44/5775) to 1.1% (171/15,434) for pneumonia, from 2.6% (400/15,434) to 3% (175/5775) for sepsis, and from 0.8% (125/15,434) to 0.9% (50/5775) for urinary tract infections. In all outcomes, the grouping of data using standard vocabularies resulted in a reduced drop in AUC and F(1)-score in external validation compared to baseline features (all P<.001, except urinary tract infection AUC: P=.002). The difference-in-difference metrics ranged from 0.005 to 0.248 for AUC and from 0.075 to 0.216 for F(1)-score. CONCLUSIONS: We demonstrated that grouping machine learning model features based on standard vocabularies improved model transferability between data sets across 2 institutions. Improving model transferability using standard vocabularies has the potential to improve the generalization of clinical prediction models across the health care system. JMIR Publications 2022-08-30 /pmc/articles/PMC9472055/ /pubmed/36040784 http://dx.doi.org/10.2196/39057 Text en ©Amber C Kiser, Karen Eilbeck, Jeffrey P Ferraro, David E Skarda, Matthew H Samore, Brian Bucher. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 30.08.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Kiser, Amber C Eilbeck, Karen Ferraro, Jeffrey P Skarda, David E Samore, Matthew H Bucher, Brian Standard Vocabularies to Improve Machine Learning Model Transferability With Electronic Health Record Data: Retrospective Cohort Study Using Health Care–Associated Infection
title	Standard Vocabularies to Improve Machine Learning Model Transferability With Electronic Health Record Data: Retrospective Cohort Study Using Health Care–Associated Infection
title_full	Standard Vocabularies to Improve Machine Learning Model Transferability With Electronic Health Record Data: Retrospective Cohort Study Using Health Care–Associated Infection
title_fullStr	Standard Vocabularies to Improve Machine Learning Model Transferability With Electronic Health Record Data: Retrospective Cohort Study Using Health Care–Associated Infection
title_full_unstemmed	Standard Vocabularies to Improve Machine Learning Model Transferability With Electronic Health Record Data: Retrospective Cohort Study Using Health Care–Associated Infection
title_short	Standard Vocabularies to Improve Machine Learning Model Transferability With Electronic Health Record Data: Retrospective Cohort Study Using Health Care–Associated Infection
title_sort	standard vocabularies to improve machine learning model transferability with electronic health record data: retrospective cohort study using health care–associated infection
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9472055/ https://www.ncbi.nlm.nih.gov/pubmed/36040784 http://dx.doi.org/10.2196/39057
work_keys_str_mv	AT kiseramberc standardvocabulariestoimprovemachinelearningmodeltransferabilitywithelectronichealthrecorddataretrospectivecohortstudyusinghealthcareassociatedinfection AT eilbeckkaren standardvocabulariestoimprovemachinelearningmodeltransferabilitywithelectronichealthrecorddataretrospectivecohortstudyusinghealthcareassociatedinfection AT ferrarojeffreyp standardvocabulariestoimprovemachinelearningmodeltransferabilitywithelectronichealthrecorddataretrospectivecohortstudyusinghealthcareassociatedinfection AT skardadavide standardvocabulariestoimprovemachinelearningmodeltransferabilitywithelectronichealthrecorddataretrospectivecohortstudyusinghealthcareassociatedinfection AT samorematthewh standardvocabulariestoimprovemachinelearningmodeltransferabilitywithelectronichealthrecorddataretrospectivecohortstudyusinghealthcareassociatedinfection AT bucherbrian standardvocabulariestoimprovemachinelearningmodeltransferabilitywithelectronichealthrecorddataretrospectivecohortstudyusinghealthcareassociatedinfection

Standard Vocabularies to Improve Machine Learning Model Transferability With Electronic Health Record Data: Retrospective Cohort Study Using Health Care–Associated Infection

Ejemplares similares