Cargando…

Development and validation of machine learning models to identify high-risk surgical patients using automatically curated electronic health record data (Pythia): A retrospective, single-site study

BACKGROUND: Pythia is an automated, clinically curated surgical data pipeline and repository housing all surgical patient electronic health record (EHR) data from a large, quaternary, multisite health institute for data science initiatives. In an effort to better identify high-risk surgical patients...

Descripción completa

Detalles Bibliográficos
Autores principales: Corey, Kristin M., Kashyap, Sehj, Lorenzi, Elizabeth, Lagoo-Deenadayalan, Sandhya A., Heller, Katherine, Whalen, Krista, Balu, Suresh, Heflin, Mitchell T., McDonald, Shelley R., Swaminathan, Madhav, Sendak, Mark
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6258507/
https://www.ncbi.nlm.nih.gov/pubmed/30481172
http://dx.doi.org/10.1371/journal.pmed.1002701
_version_ 1783374508347883520
author Corey, Kristin M.
Kashyap, Sehj
Lorenzi, Elizabeth
Lagoo-Deenadayalan, Sandhya A.
Heller, Katherine
Whalen, Krista
Balu, Suresh
Heflin, Mitchell T.
McDonald, Shelley R.
Swaminathan, Madhav
Sendak, Mark
author_facet Corey, Kristin M.
Kashyap, Sehj
Lorenzi, Elizabeth
Lagoo-Deenadayalan, Sandhya A.
Heller, Katherine
Whalen, Krista
Balu, Suresh
Heflin, Mitchell T.
McDonald, Shelley R.
Swaminathan, Madhav
Sendak, Mark
author_sort Corey, Kristin M.
collection PubMed
description BACKGROUND: Pythia is an automated, clinically curated surgical data pipeline and repository housing all surgical patient electronic health record (EHR) data from a large, quaternary, multisite health institute for data science initiatives. In an effort to better identify high-risk surgical patients from complex data, a machine learning project trained on Pythia was built to predict postoperative complication risk. METHODS AND FINDINGS: A curated data repository of surgical outcomes was created using automated SQL and R code that extracted and processed patient clinical and surgical data across 37 million clinical encounters from the EHRs. A total of 194 clinical features including patient demographics (e.g., age, sex, race), smoking status, medications, comorbidities, procedure information, and proxies for surgical complexity were constructed and aggregated. A cohort of 66,370 patients that had undergone 99,755 invasive procedural encounters between January 1, 2014, and January 31, 2017, was studied further for the purpose of predicting postoperative complications. The average complication and 30-day postoperative mortality rates of this cohort were 16.0% and 0.51%, respectively. Least absolute shrinkage and selection operator (lasso) penalized logistic regression, random forest models, and extreme gradient boosted decision trees were trained on this surgical cohort with cross-validation on 14 specific postoperative outcome groupings. Resulting models had area under the receiver operator characteristic curve (AUC) values ranging between 0.747 and 0.924, calculated on an out-of-sample test set from the last 5 months of data. Lasso penalized regression was identified as a high-performing model, providing clinically interpretable actionable insights. Highest and lowest performing lasso models predicted postoperative shock and genitourinary outcomes with AUCs of 0.924 (95% CI: 0.901, 0.946) and 0.780 (95% CI: 0.752, 0.810), respectively. A calculator requiring input of 9 data fields was created to produce a risk assessment for the 14 groupings of postoperative outcomes. A high-risk threshold (15% risk of any complication) was determined to identify high-risk surgical patients. The model sensitivity was 76%, with a specificity of 76%. Compared to heuristics that identify high-risk patients developed by clinical experts and the ACS NSQIP calculator, this tool performed superiorly, providing an improved approach for clinicians to estimate postoperative risk for patients. Limitations of this study include the missingness of data that were removed for analysis. CONCLUSIONS: Extracting and curating a large, local institution’s EHR data for machine learning purposes resulted in models with strong predictive performance. These models can be used in clinical settings as decision support tools for identification of high-risk patients as well as patient evaluation and care management. Further work is necessary to evaluate the impact of the Pythia risk calculator within the clinical workflow on postoperative outcomes and to optimize this data flow for future machine learning efforts.
format Online
Article
Text
id pubmed-6258507
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-62585072018-12-06 Development and validation of machine learning models to identify high-risk surgical patients using automatically curated electronic health record data (Pythia): A retrospective, single-site study Corey, Kristin M. Kashyap, Sehj Lorenzi, Elizabeth Lagoo-Deenadayalan, Sandhya A. Heller, Katherine Whalen, Krista Balu, Suresh Heflin, Mitchell T. McDonald, Shelley R. Swaminathan, Madhav Sendak, Mark PLoS Med Research Article BACKGROUND: Pythia is an automated, clinically curated surgical data pipeline and repository housing all surgical patient electronic health record (EHR) data from a large, quaternary, multisite health institute for data science initiatives. In an effort to better identify high-risk surgical patients from complex data, a machine learning project trained on Pythia was built to predict postoperative complication risk. METHODS AND FINDINGS: A curated data repository of surgical outcomes was created using automated SQL and R code that extracted and processed patient clinical and surgical data across 37 million clinical encounters from the EHRs. A total of 194 clinical features including patient demographics (e.g., age, sex, race), smoking status, medications, comorbidities, procedure information, and proxies for surgical complexity were constructed and aggregated. A cohort of 66,370 patients that had undergone 99,755 invasive procedural encounters between January 1, 2014, and January 31, 2017, was studied further for the purpose of predicting postoperative complications. The average complication and 30-day postoperative mortality rates of this cohort were 16.0% and 0.51%, respectively. Least absolute shrinkage and selection operator (lasso) penalized logistic regression, random forest models, and extreme gradient boosted decision trees were trained on this surgical cohort with cross-validation on 14 specific postoperative outcome groupings. Resulting models had area under the receiver operator characteristic curve (AUC) values ranging between 0.747 and 0.924, calculated on an out-of-sample test set from the last 5 months of data. Lasso penalized regression was identified as a high-performing model, providing clinically interpretable actionable insights. Highest and lowest performing lasso models predicted postoperative shock and genitourinary outcomes with AUCs of 0.924 (95% CI: 0.901, 0.946) and 0.780 (95% CI: 0.752, 0.810), respectively. A calculator requiring input of 9 data fields was created to produce a risk assessment for the 14 groupings of postoperative outcomes. A high-risk threshold (15% risk of any complication) was determined to identify high-risk surgical patients. The model sensitivity was 76%, with a specificity of 76%. Compared to heuristics that identify high-risk patients developed by clinical experts and the ACS NSQIP calculator, this tool performed superiorly, providing an improved approach for clinicians to estimate postoperative risk for patients. Limitations of this study include the missingness of data that were removed for analysis. CONCLUSIONS: Extracting and curating a large, local institution’s EHR data for machine learning purposes resulted in models with strong predictive performance. These models can be used in clinical settings as decision support tools for identification of high-risk patients as well as patient evaluation and care management. Further work is necessary to evaluate the impact of the Pythia risk calculator within the clinical workflow on postoperative outcomes and to optimize this data flow for future machine learning efforts. Public Library of Science 2018-11-27 /pmc/articles/PMC6258507/ /pubmed/30481172 http://dx.doi.org/10.1371/journal.pmed.1002701 Text en © 2018 Corey et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Corey, Kristin M.
Kashyap, Sehj
Lorenzi, Elizabeth
Lagoo-Deenadayalan, Sandhya A.
Heller, Katherine
Whalen, Krista
Balu, Suresh
Heflin, Mitchell T.
McDonald, Shelley R.
Swaminathan, Madhav
Sendak, Mark
Development and validation of machine learning models to identify high-risk surgical patients using automatically curated electronic health record data (Pythia): A retrospective, single-site study
title Development and validation of machine learning models to identify high-risk surgical patients using automatically curated electronic health record data (Pythia): A retrospective, single-site study
title_full Development and validation of machine learning models to identify high-risk surgical patients using automatically curated electronic health record data (Pythia): A retrospective, single-site study
title_fullStr Development and validation of machine learning models to identify high-risk surgical patients using automatically curated electronic health record data (Pythia): A retrospective, single-site study
title_full_unstemmed Development and validation of machine learning models to identify high-risk surgical patients using automatically curated electronic health record data (Pythia): A retrospective, single-site study
title_short Development and validation of machine learning models to identify high-risk surgical patients using automatically curated electronic health record data (Pythia): A retrospective, single-site study
title_sort development and validation of machine learning models to identify high-risk surgical patients using automatically curated electronic health record data (pythia): a retrospective, single-site study
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6258507/
https://www.ncbi.nlm.nih.gov/pubmed/30481172
http://dx.doi.org/10.1371/journal.pmed.1002701
work_keys_str_mv AT coreykristinm developmentandvalidationofmachinelearningmodelstoidentifyhighrisksurgicalpatientsusingautomaticallycuratedelectronichealthrecorddatapythiaaretrospectivesinglesitestudy
AT kashyapsehj developmentandvalidationofmachinelearningmodelstoidentifyhighrisksurgicalpatientsusingautomaticallycuratedelectronichealthrecorddatapythiaaretrospectivesinglesitestudy
AT lorenzielizabeth developmentandvalidationofmachinelearningmodelstoidentifyhighrisksurgicalpatientsusingautomaticallycuratedelectronichealthrecorddatapythiaaretrospectivesinglesitestudy
AT lagoodeenadayalansandhyaa developmentandvalidationofmachinelearningmodelstoidentifyhighrisksurgicalpatientsusingautomaticallycuratedelectronichealthrecorddatapythiaaretrospectivesinglesitestudy
AT hellerkatherine developmentandvalidationofmachinelearningmodelstoidentifyhighrisksurgicalpatientsusingautomaticallycuratedelectronichealthrecorddatapythiaaretrospectivesinglesitestudy
AT whalenkrista developmentandvalidationofmachinelearningmodelstoidentifyhighrisksurgicalpatientsusingautomaticallycuratedelectronichealthrecorddatapythiaaretrospectivesinglesitestudy
AT balusuresh developmentandvalidationofmachinelearningmodelstoidentifyhighrisksurgicalpatientsusingautomaticallycuratedelectronichealthrecorddatapythiaaretrospectivesinglesitestudy
AT heflinmitchellt developmentandvalidationofmachinelearningmodelstoidentifyhighrisksurgicalpatientsusingautomaticallycuratedelectronichealthrecorddatapythiaaretrospectivesinglesitestudy
AT mcdonaldshelleyr developmentandvalidationofmachinelearningmodelstoidentifyhighrisksurgicalpatientsusingautomaticallycuratedelectronichealthrecorddatapythiaaretrospectivesinglesitestudy
AT swaminathanmadhav developmentandvalidationofmachinelearningmodelstoidentifyhighrisksurgicalpatientsusingautomaticallycuratedelectronichealthrecorddatapythiaaretrospectivesinglesitestudy
AT sendakmark developmentandvalidationofmachinelearningmodelstoidentifyhighrisksurgicalpatientsusingautomaticallycuratedelectronichealthrecorddatapythiaaretrospectivesinglesitestudy