Cargando…

Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease

Prognostic modelling is important in clinical practice and epidemiology for patient management and research. Electronic health records (EHR) provide large quantities of data for such models, but conventional epidemiological approaches require significant researcher time to implement. Expert selectio...

Descripción completa

Detalles Bibliográficos
Autores principales: Steele, Andrew J., Denaxas, Spiros C., Shah, Anoop D., Hemingway, Harry, Luscombe, Nicholas M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6118376/
https://www.ncbi.nlm.nih.gov/pubmed/30169498
http://dx.doi.org/10.1371/journal.pone.0202344
_version_ 1783351919818833920
author Steele, Andrew J.
Denaxas, Spiros C.
Shah, Anoop D.
Hemingway, Harry
Luscombe, Nicholas M.
author_facet Steele, Andrew J.
Denaxas, Spiros C.
Shah, Anoop D.
Hemingway, Harry
Luscombe, Nicholas M.
author_sort Steele, Andrew J.
collection PubMed
description Prognostic modelling is important in clinical practice and epidemiology for patient management and research. Electronic health records (EHR) provide large quantities of data for such models, but conventional epidemiological approaches require significant researcher time to implement. Expert selection of variables, fine-tuning of variable transformations and interactions, and imputing missing values are time-consuming and could bias subsequent analysis, particularly given that missingness in EHR is both high, and may carry meaning. Using a cohort of 80,000 patients from the CALIBER programme, we compared traditional modelling and machine-learning approaches in EHR. First, we used Cox models and random survival forests with and without imputation on 27 expert-selected, preprocessed variables to predict all-cause mortality. We then used Cox models, random forests and elastic net regression on an extended dataset with 586 variables to build prognostic models and identify novel prognostic factors without prior expert input. We observed that data-driven models used on an extended dataset can outperform conventional models for prognosis, without data preprocessing or imputing missing values. An elastic net Cox regression based with 586 unimputed variables with continuous values discretised achieved a C-index of 0.801 (bootstrapped 95% CI 0.799 to 0.802), compared to 0.793 (0.791 to 0.794) for a traditional Cox model comprising 27 expert-selected variables with imputation for missing values. We also found that data-driven models allow identification of novel prognostic variables; that the absence of values for particular variables carries meaning, and can have significant implications for prognosis; and that variables often have a nonlinear association with mortality, which discretised Cox models and random forests can elucidate. This demonstrates that machine-learning approaches applied to raw EHR data can be used to build models for use in research and clinical practice, and identify novel predictive variables and their effects to inform future research.
format Online
Article
Text
id pubmed-6118376
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-61183762018-09-15 Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease Steele, Andrew J. Denaxas, Spiros C. Shah, Anoop D. Hemingway, Harry Luscombe, Nicholas M. PLoS One Research Article Prognostic modelling is important in clinical practice and epidemiology for patient management and research. Electronic health records (EHR) provide large quantities of data for such models, but conventional epidemiological approaches require significant researcher time to implement. Expert selection of variables, fine-tuning of variable transformations and interactions, and imputing missing values are time-consuming and could bias subsequent analysis, particularly given that missingness in EHR is both high, and may carry meaning. Using a cohort of 80,000 patients from the CALIBER programme, we compared traditional modelling and machine-learning approaches in EHR. First, we used Cox models and random survival forests with and without imputation on 27 expert-selected, preprocessed variables to predict all-cause mortality. We then used Cox models, random forests and elastic net regression on an extended dataset with 586 variables to build prognostic models and identify novel prognostic factors without prior expert input. We observed that data-driven models used on an extended dataset can outperform conventional models for prognosis, without data preprocessing or imputing missing values. An elastic net Cox regression based with 586 unimputed variables with continuous values discretised achieved a C-index of 0.801 (bootstrapped 95% CI 0.799 to 0.802), compared to 0.793 (0.791 to 0.794) for a traditional Cox model comprising 27 expert-selected variables with imputation for missing values. We also found that data-driven models allow identification of novel prognostic variables; that the absence of values for particular variables carries meaning, and can have significant implications for prognosis; and that variables often have a nonlinear association with mortality, which discretised Cox models and random forests can elucidate. This demonstrates that machine-learning approaches applied to raw EHR data can be used to build models for use in research and clinical practice, and identify novel predictive variables and their effects to inform future research. Public Library of Science 2018-08-31 /pmc/articles/PMC6118376/ /pubmed/30169498 http://dx.doi.org/10.1371/journal.pone.0202344 Text en © 2018 Steele et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Steele, Andrew J.
Denaxas, Spiros C.
Shah, Anoop D.
Hemingway, Harry
Luscombe, Nicholas M.
Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease
title Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease
title_full Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease
title_fullStr Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease
title_full_unstemmed Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease
title_short Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease
title_sort machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6118376/
https://www.ncbi.nlm.nih.gov/pubmed/30169498
http://dx.doi.org/10.1371/journal.pone.0202344
work_keys_str_mv AT steeleandrewj machinelearningmodelsinelectronichealthrecordscanoutperformconventionalsurvivalmodelsforpredictingpatientmortalityincoronaryarterydisease
AT denaxasspirosc machinelearningmodelsinelectronichealthrecordscanoutperformconventionalsurvivalmodelsforpredictingpatientmortalityincoronaryarterydisease
AT shahanoopd machinelearningmodelsinelectronichealthrecordscanoutperformconventionalsurvivalmodelsforpredictingpatientmortalityincoronaryarterydisease
AT hemingwayharry machinelearningmodelsinelectronichealthrecordscanoutperformconventionalsurvivalmodelsforpredictingpatientmortalityincoronaryarterydisease
AT luscombenicholasm machinelearningmodelsinelectronichealthrecordscanoutperformconventionalsurvivalmodelsforpredictingpatientmortalityincoronaryarterydisease