Validation of risk prediction models applied to longitudinal electronic health record data for the prediction of major cardiovascular events in the presence of data shifts

AIMS: Deep learning has dominated predictive modelling across different fields, but in medicine it has been met with mixed reception. In clinical practice, simple, statistical models and risk scores continue to inform cardiovascular disease risk predictions. This is due in part to the knowledge gap...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Yikuan, Salimi-Khorshidi, Gholamreza, Rao, Shishir, Canoy, Dexter, Hassaine, Abdelaali, Lukasiewicz, Thomas, Rahimi, Kazem, Mamouei, Mohammad
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2022
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9779795/ https://www.ncbi.nlm.nih.gov/pubmed/36710898 http://dx.doi.org/10.1093/ehjdh/ztac061

_version_	1784856697200705536
author	Li, Yikuan Salimi-Khorshidi, Gholamreza Rao, Shishir Canoy, Dexter Hassaine, Abdelaali Lukasiewicz, Thomas Rahimi, Kazem Mamouei, Mohammad
author_facet	Li, Yikuan Salimi-Khorshidi, Gholamreza Rao, Shishir Canoy, Dexter Hassaine, Abdelaali Lukasiewicz, Thomas Rahimi, Kazem Mamouei, Mohammad
author_sort	Li, Yikuan
collection	PubMed
description	AIMS: Deep learning has dominated predictive modelling across different fields, but in medicine it has been met with mixed reception. In clinical practice, simple, statistical models and risk scores continue to inform cardiovascular disease risk predictions. This is due in part to the knowledge gap about how deep learning models perform in practice when they are subject to dynamic data shifts; a key criterion that common internal validation procedures do not address. We evaluated the performance of a novel deep learning model, BEHRT, under data shifts and compared it with several ML-based and established risk models. METHODS AND RESULTS: Using linked electronic health records of 1.1 million patients across England aged at least 35 years between 1985 and 2015, we replicated three established statistical models for predicting 5-year risk of incident heart failure, stroke, and coronary heart disease. The results were compared with a widely accepted machine learning model (random forests), and a novel deep learning model (BEHRT). In addition to internal validation, we investigated how data shifts affect model discrimination and calibration. To this end, we tested the models on cohorts from (i) distinct geographical regions; (ii) different periods. Using internal validation, the deep learning models substantially outperformed the best statistical models by 6%, 8%, and 11% in heart failure, stroke, and coronary heart disease, respectively, in terms of the area under the receiver operating characteristic curve. CONCLUSION: The performance of all models declined as a result of data shifts; despite this, the deep learning models maintained the best performance in all risk prediction tasks. Updating the model with the latest information can improve discrimination but if the prior distribution changes, the model may remain miscalibrated.
format	Online Article Text
id	pubmed-9779795
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-97797952023-01-27 Validation of risk prediction models applied to longitudinal electronic health record data for the prediction of major cardiovascular events in the presence of data shifts Li, Yikuan Salimi-Khorshidi, Gholamreza Rao, Shishir Canoy, Dexter Hassaine, Abdelaali Lukasiewicz, Thomas Rahimi, Kazem Mamouei, Mohammad Eur Heart J Digit Health Original Article AIMS: Deep learning has dominated predictive modelling across different fields, but in medicine it has been met with mixed reception. In clinical practice, simple, statistical models and risk scores continue to inform cardiovascular disease risk predictions. This is due in part to the knowledge gap about how deep learning models perform in practice when they are subject to dynamic data shifts; a key criterion that common internal validation procedures do not address. We evaluated the performance of a novel deep learning model, BEHRT, under data shifts and compared it with several ML-based and established risk models. METHODS AND RESULTS: Using linked electronic health records of 1.1 million patients across England aged at least 35 years between 1985 and 2015, we replicated three established statistical models for predicting 5-year risk of incident heart failure, stroke, and coronary heart disease. The results were compared with a widely accepted machine learning model (random forests), and a novel deep learning model (BEHRT). In addition to internal validation, we investigated how data shifts affect model discrimination and calibration. To this end, we tested the models on cohorts from (i) distinct geographical regions; (ii) different periods. Using internal validation, the deep learning models substantially outperformed the best statistical models by 6%, 8%, and 11% in heart failure, stroke, and coronary heart disease, respectively, in terms of the area under the receiver operating characteristic curve. CONCLUSION: The performance of all models declined as a result of data shifts; despite this, the deep learning models maintained the best performance in all risk prediction tasks. Updating the model with the latest information can improve discrimination but if the prior distribution changes, the model may remain miscalibrated. Oxford University Press 2022-10-21 /pmc/articles/PMC9779795/ /pubmed/36710898 http://dx.doi.org/10.1093/ehjdh/ztac061 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of the European Society of Cardiology. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Article Li, Yikuan Salimi-Khorshidi, Gholamreza Rao, Shishir Canoy, Dexter Hassaine, Abdelaali Lukasiewicz, Thomas Rahimi, Kazem Mamouei, Mohammad Validation of risk prediction models applied to longitudinal electronic health record data for the prediction of major cardiovascular events in the presence of data shifts
title	Validation of risk prediction models applied to longitudinal electronic health record data for the prediction of major cardiovascular events in the presence of data shifts
title_full	Validation of risk prediction models applied to longitudinal electronic health record data for the prediction of major cardiovascular events in the presence of data shifts
title_fullStr	Validation of risk prediction models applied to longitudinal electronic health record data for the prediction of major cardiovascular events in the presence of data shifts
title_full_unstemmed	Validation of risk prediction models applied to longitudinal electronic health record data for the prediction of major cardiovascular events in the presence of data shifts
title_short	Validation of risk prediction models applied to longitudinal electronic health record data for the prediction of major cardiovascular events in the presence of data shifts
title_sort	validation of risk prediction models applied to longitudinal electronic health record data for the prediction of major cardiovascular events in the presence of data shifts
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9779795/ https://www.ncbi.nlm.nih.gov/pubmed/36710898 http://dx.doi.org/10.1093/ehjdh/ztac061
work_keys_str_mv	AT liyikuan validationofriskpredictionmodelsappliedtolongitudinalelectronichealthrecorddataforthepredictionofmajorcardiovasculareventsinthepresenceofdatashifts AT salimikhorshidigholamreza validationofriskpredictionmodelsappliedtolongitudinalelectronichealthrecorddataforthepredictionofmajorcardiovasculareventsinthepresenceofdatashifts AT raoshishir validationofriskpredictionmodelsappliedtolongitudinalelectronichealthrecorddataforthepredictionofmajorcardiovasculareventsinthepresenceofdatashifts AT canoydexter validationofriskpredictionmodelsappliedtolongitudinalelectronichealthrecorddataforthepredictionofmajorcardiovasculareventsinthepresenceofdatashifts AT hassaineabdelaali validationofriskpredictionmodelsappliedtolongitudinalelectronichealthrecorddataforthepredictionofmajorcardiovasculareventsinthepresenceofdatashifts AT lukasiewiczthomas validationofriskpredictionmodelsappliedtolongitudinalelectronichealthrecorddataforthepredictionofmajorcardiovasculareventsinthepresenceofdatashifts AT rahimikazem validationofriskpredictionmodelsappliedtolongitudinalelectronichealthrecorddataforthepredictionofmajorcardiovasculareventsinthepresenceofdatashifts AT mamoueimohammad validationofriskpredictionmodelsappliedtolongitudinalelectronichealthrecorddataforthepredictionofmajorcardiovasculareventsinthepresenceofdatashifts

Validation of risk prediction models applied to longitudinal electronic health record data for the prediction of major cardiovascular events in the presence of data shifts

Ejemplares similares