Cargando…

Predicting the risk of emergency admission with machine learning: Development and validation using linked electronic health records

BACKGROUND: Emergency admissions are a major source of healthcare spending. We aimed to derive, validate, and compare conventional and machine learning models for prediction of the first emergency admission. Machine learning methods are capable of capturing complex interactions that are likely to be...

Descripción completa

Detalles Bibliográficos
Autores principales: Rahimian, Fatemeh, Salimi-Khorshidi, Gholamreza, Payberah, Amir H., Tran, Jenny, Ayala Solares, Roberto, Raimondi, Francesca, Nazarzadeh, Milad, Canoy, Dexter, Rahimi, Kazem
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6245681/
https://www.ncbi.nlm.nih.gov/pubmed/30458006
http://dx.doi.org/10.1371/journal.pmed.1002695
_version_ 1783372283480375296
author Rahimian, Fatemeh
Salimi-Khorshidi, Gholamreza
Payberah, Amir H.
Tran, Jenny
Ayala Solares, Roberto
Raimondi, Francesca
Nazarzadeh, Milad
Canoy, Dexter
Rahimi, Kazem
author_facet Rahimian, Fatemeh
Salimi-Khorshidi, Gholamreza
Payberah, Amir H.
Tran, Jenny
Ayala Solares, Roberto
Raimondi, Francesca
Nazarzadeh, Milad
Canoy, Dexter
Rahimi, Kazem
author_sort Rahimian, Fatemeh
collection PubMed
description BACKGROUND: Emergency admissions are a major source of healthcare spending. We aimed to derive, validate, and compare conventional and machine learning models for prediction of the first emergency admission. Machine learning methods are capable of capturing complex interactions that are likely to be present when predicting less specific outcomes, such as this one. METHODS AND FINDINGS: We used longitudinal data from linked electronic health records of 4.6 million patients aged 18–100 years from 389 practices across England between 1985 to 2015. The population was divided into a derivation cohort (80%, 3.75 million patients from 300 general practices) and a validation cohort (20%, 0.88 million patients from 89 general practices) from geographically distinct regions with different risk levels. We first replicated a previously reported Cox proportional hazards (CPH) model for prediction of the risk of the first emergency admission up to 24 months after baseline. This reference model was then compared with 2 machine learning models, random forest (RF) and gradient boosting classifier (GBC). The initial set of predictors for all models included 43 variables, including patient demographics, lifestyle factors, laboratory tests, currently prescribed medications, selected morbidities, and previous emergency admissions. We then added 13 more variables (marital status, prior general practice visits, and 11 additional morbidities), and also enriched all variables by incorporating temporal information whenever possible (e.g., time since first diagnosis). We also varied the prediction windows to 12, 36, 48, and 60 months after baseline and compared model performances. For internal validation, we used 5-fold cross-validation. When the initial set of variables was used, GBC outperformed RF and CPH, with an area under the receiver operating characteristic curve (AUC) of 0.779 (95% CI 0.777, 0.781), compared to 0.752 (95% CI 0.751, 0.753) and 0.740 (95% CI 0.739, 0.741), respectively. In external validation, we observed an AUC of 0.796, 0.736, and 0.736 for GBC, RF, and CPH, respectively. The addition of temporal information improved AUC across all models. In internal validation, the AUC rose to 0.848 (95% CI 0.847, 0.849), 0.825 (95% CI 0.824, 0.826), and 0.805 (95% CI 0.804, 0.806) for GBC, RF, and CPH, respectively, while the AUC in external validation rose to 0.826, 0.810, and 0.788, respectively. This enhancement also resulted in robust predictions for longer time horizons, with AUC values remaining at similar levels across all models. Overall, compared to the baseline reference CPH model, the final GBC model showed a 10.8% higher AUC (0.848 compared to 0.740) for prediction of risk of emergency admission within 24 months. GBC also showed the best calibration throughout the risk spectrum. Despite the wide range of variables included in models, our study was still limited by the number of variables included; inclusion of more variables could have further improved model performances. CONCLUSIONS: The use of machine learning and addition of temporal information led to substantially improved discrimination and calibration for predicting the risk of emergency admission. Model performance remained stable across a range of prediction time windows and when externally validated. These findings support the potential of incorporating machine learning models into electronic health records to inform care and service planning.
format Online
Article
Text
id pubmed-6245681
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-62456812018-12-01 Predicting the risk of emergency admission with machine learning: Development and validation using linked electronic health records Rahimian, Fatemeh Salimi-Khorshidi, Gholamreza Payberah, Amir H. Tran, Jenny Ayala Solares, Roberto Raimondi, Francesca Nazarzadeh, Milad Canoy, Dexter Rahimi, Kazem PLoS Med Research Article BACKGROUND: Emergency admissions are a major source of healthcare spending. We aimed to derive, validate, and compare conventional and machine learning models for prediction of the first emergency admission. Machine learning methods are capable of capturing complex interactions that are likely to be present when predicting less specific outcomes, such as this one. METHODS AND FINDINGS: We used longitudinal data from linked electronic health records of 4.6 million patients aged 18–100 years from 389 practices across England between 1985 to 2015. The population was divided into a derivation cohort (80%, 3.75 million patients from 300 general practices) and a validation cohort (20%, 0.88 million patients from 89 general practices) from geographically distinct regions with different risk levels. We first replicated a previously reported Cox proportional hazards (CPH) model for prediction of the risk of the first emergency admission up to 24 months after baseline. This reference model was then compared with 2 machine learning models, random forest (RF) and gradient boosting classifier (GBC). The initial set of predictors for all models included 43 variables, including patient demographics, lifestyle factors, laboratory tests, currently prescribed medications, selected morbidities, and previous emergency admissions. We then added 13 more variables (marital status, prior general practice visits, and 11 additional morbidities), and also enriched all variables by incorporating temporal information whenever possible (e.g., time since first diagnosis). We also varied the prediction windows to 12, 36, 48, and 60 months after baseline and compared model performances. For internal validation, we used 5-fold cross-validation. When the initial set of variables was used, GBC outperformed RF and CPH, with an area under the receiver operating characteristic curve (AUC) of 0.779 (95% CI 0.777, 0.781), compared to 0.752 (95% CI 0.751, 0.753) and 0.740 (95% CI 0.739, 0.741), respectively. In external validation, we observed an AUC of 0.796, 0.736, and 0.736 for GBC, RF, and CPH, respectively. The addition of temporal information improved AUC across all models. In internal validation, the AUC rose to 0.848 (95% CI 0.847, 0.849), 0.825 (95% CI 0.824, 0.826), and 0.805 (95% CI 0.804, 0.806) for GBC, RF, and CPH, respectively, while the AUC in external validation rose to 0.826, 0.810, and 0.788, respectively. This enhancement also resulted in robust predictions for longer time horizons, with AUC values remaining at similar levels across all models. Overall, compared to the baseline reference CPH model, the final GBC model showed a 10.8% higher AUC (0.848 compared to 0.740) for prediction of risk of emergency admission within 24 months. GBC also showed the best calibration throughout the risk spectrum. Despite the wide range of variables included in models, our study was still limited by the number of variables included; inclusion of more variables could have further improved model performances. CONCLUSIONS: The use of machine learning and addition of temporal information led to substantially improved discrimination and calibration for predicting the risk of emergency admission. Model performance remained stable across a range of prediction time windows and when externally validated. These findings support the potential of incorporating machine learning models into electronic health records to inform care and service planning. Public Library of Science 2018-11-20 /pmc/articles/PMC6245681/ /pubmed/30458006 http://dx.doi.org/10.1371/journal.pmed.1002695 Text en © 2018 Rahimian et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Rahimian, Fatemeh
Salimi-Khorshidi, Gholamreza
Payberah, Amir H.
Tran, Jenny
Ayala Solares, Roberto
Raimondi, Francesca
Nazarzadeh, Milad
Canoy, Dexter
Rahimi, Kazem
Predicting the risk of emergency admission with machine learning: Development and validation using linked electronic health records
title Predicting the risk of emergency admission with machine learning: Development and validation using linked electronic health records
title_full Predicting the risk of emergency admission with machine learning: Development and validation using linked electronic health records
title_fullStr Predicting the risk of emergency admission with machine learning: Development and validation using linked electronic health records
title_full_unstemmed Predicting the risk of emergency admission with machine learning: Development and validation using linked electronic health records
title_short Predicting the risk of emergency admission with machine learning: Development and validation using linked electronic health records
title_sort predicting the risk of emergency admission with machine learning: development and validation using linked electronic health records
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6245681/
https://www.ncbi.nlm.nih.gov/pubmed/30458006
http://dx.doi.org/10.1371/journal.pmed.1002695
work_keys_str_mv AT rahimianfatemeh predictingtheriskofemergencyadmissionwithmachinelearningdevelopmentandvalidationusinglinkedelectronichealthrecords
AT salimikhorshidigholamreza predictingtheriskofemergencyadmissionwithmachinelearningdevelopmentandvalidationusinglinkedelectronichealthrecords
AT payberahamirh predictingtheriskofemergencyadmissionwithmachinelearningdevelopmentandvalidationusinglinkedelectronichealthrecords
AT tranjenny predictingtheriskofemergencyadmissionwithmachinelearningdevelopmentandvalidationusinglinkedelectronichealthrecords
AT ayalasolaresroberto predictingtheriskofemergencyadmissionwithmachinelearningdevelopmentandvalidationusinglinkedelectronichealthrecords
AT raimondifrancesca predictingtheriskofemergencyadmissionwithmachinelearningdevelopmentandvalidationusinglinkedelectronichealthrecords
AT nazarzadehmilad predictingtheriskofemergencyadmissionwithmachinelearningdevelopmentandvalidationusinglinkedelectronichealthrecords
AT canoydexter predictingtheriskofemergencyadmissionwithmachinelearningdevelopmentandvalidationusinglinkedelectronichealthrecords
AT rahimikazem predictingtheriskofemergencyadmissionwithmachinelearningdevelopmentandvalidationusinglinkedelectronichealthrecords