Cargando…

Estimating real-world performance of a predictive model: a case-study in predicting mortality

OBJECTIVE: One primary consideration when developing predictive models is downstream effects on future model performance. We conduct experiments to quantify the effects of experimental design choices, namely cohort selection and internal validation methods, on (estimated) real-world model performanc...

Descripción completa

Detalles Bibliográficos
Autores principales:	Major, Vincent J, Jethani, Neil, Aphinyanaphongs, Yindalon
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2020
Materias:	Research and Applications
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7382635/ https://www.ncbi.nlm.nih.gov/pubmed/32734165 http://dx.doi.org/10.1093/jamiaopen/ooaa008

_version_	1783563283641401344
author	Major, Vincent J Jethani, Neil Aphinyanaphongs, Yindalon
author_facet	Major, Vincent J Jethani, Neil Aphinyanaphongs, Yindalon
author_sort	Major, Vincent J
collection	PubMed
description	OBJECTIVE: One primary consideration when developing predictive models is downstream effects on future model performance. We conduct experiments to quantify the effects of experimental design choices, namely cohort selection and internal validation methods, on (estimated) real-world model performance. MATERIALS AND METHODS: Four years of hospitalizations are used to develop a 1-year mortality prediction model (composite of death or initiation of hospice care). Two common methods to select appropriate patient visits from their encounter history (backwards-from-outcome and forwards-from-admission) are combined with 2 testing cohorts (random and temporal validation). Two models are trained under otherwise identical conditions, and their performances compared. Operating thresholds are selected in each test set and applied to a “real-world” cohort of labeled admissions from another, unused year. RESULTS: Backwards-from-outcome cohort selection retains 25% of candidate admissions (n = 23 579), whereas forwards-from-admission selection includes many more (n = 92 148). Both selection methods produce similar performances when applied to a random test set. However, when applied to the temporally defined “real-world” set, forwards-from-admission yields higher areas under the ROC and precision recall curves (88.3% and 56.5% vs. 83.2% and 41.6%). DISCUSSION: A backwards-from-outcome experiment manipulates raw training data, simplifying the experiment. This manipulated data no longer resembles real-world data, resulting in optimistic estimates of test set performance, especially at high precision. In contrast, a forwards-from-admission experiment with a temporally separated test set consistently and conservatively estimates real-world performance. CONCLUSION: Experimental design choices impose bias upon selected cohorts. A forwards-from-admission experiment, validated temporally, can conservatively estimate real-world performance. LAY SUMMARY: The routine care of patients stands to benefit greatly from assistive technologies, including data-driven risk assessment. Already, many different machine learning and artificial intelligence applications are being developed from complex electronic health record data. To overcome challenges that arise from such data, researchers often start with simple experimental approaches to test their work. One key component is how patients (and their healthcare visits) are selected for the study from the pool of all patients seen. Another is how the group of patients used to create the risk estimator differs from the group used to evaluate how well it works. These choices complicate how the experimental setting compares to the real-world application to patients. For example, different selection approaches that depend on each patient’s future outcome can simplify the experiment but are impractical upon implementation as these data are unavailable. We show that this kind of “backwards” experiment optimistically estimates how well the model performs. Instead, our results advocate for experiments that select patients in a “forwards” manner and “temporal” validation that approximates training on past data and implementing on future data. More robust results help gauge the clinical utility of recent works and aid decision-making before implementation into practice.
format	Online Article Text
id	pubmed-7382635
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-73826352020-07-29 Estimating real-world performance of a predictive model: a case-study in predicting mortality Major, Vincent J Jethani, Neil Aphinyanaphongs, Yindalon JAMIA Open Research and Applications OBJECTIVE: One primary consideration when developing predictive models is downstream effects on future model performance. We conduct experiments to quantify the effects of experimental design choices, namely cohort selection and internal validation methods, on (estimated) real-world model performance. MATERIALS AND METHODS: Four years of hospitalizations are used to develop a 1-year mortality prediction model (composite of death or initiation of hospice care). Two common methods to select appropriate patient visits from their encounter history (backwards-from-outcome and forwards-from-admission) are combined with 2 testing cohorts (random and temporal validation). Two models are trained under otherwise identical conditions, and their performances compared. Operating thresholds are selected in each test set and applied to a “real-world” cohort of labeled admissions from another, unused year. RESULTS: Backwards-from-outcome cohort selection retains 25% of candidate admissions (n = 23 579), whereas forwards-from-admission selection includes many more (n = 92 148). Both selection methods produce similar performances when applied to a random test set. However, when applied to the temporally defined “real-world” set, forwards-from-admission yields higher areas under the ROC and precision recall curves (88.3% and 56.5% vs. 83.2% and 41.6%). DISCUSSION: A backwards-from-outcome experiment manipulates raw training data, simplifying the experiment. This manipulated data no longer resembles real-world data, resulting in optimistic estimates of test set performance, especially at high precision. In contrast, a forwards-from-admission experiment with a temporally separated test set consistently and conservatively estimates real-world performance. CONCLUSION: Experimental design choices impose bias upon selected cohorts. A forwards-from-admission experiment, validated temporally, can conservatively estimate real-world performance. LAY SUMMARY: The routine care of patients stands to benefit greatly from assistive technologies, including data-driven risk assessment. Already, many different machine learning and artificial intelligence applications are being developed from complex electronic health record data. To overcome challenges that arise from such data, researchers often start with simple experimental approaches to test their work. One key component is how patients (and their healthcare visits) are selected for the study from the pool of all patients seen. Another is how the group of patients used to create the risk estimator differs from the group used to evaluate how well it works. These choices complicate how the experimental setting compares to the real-world application to patients. For example, different selection approaches that depend on each patient’s future outcome can simplify the experiment but are impractical upon implementation as these data are unavailable. We show that this kind of “backwards” experiment optimistically estimates how well the model performs. Instead, our results advocate for experiments that select patients in a “forwards” manner and “temporal” validation that approximates training on past data and implementing on future data. More robust results help gauge the clinical utility of recent works and aid decision-making before implementation into practice. Oxford University Press 2020-04-26 /pmc/articles/PMC7382635/ /pubmed/32734165 http://dx.doi.org/10.1093/jamiaopen/ooaa008 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Research and Applications Major, Vincent J Jethani, Neil Aphinyanaphongs, Yindalon Estimating real-world performance of a predictive model: a case-study in predicting mortality
title	Estimating real-world performance of a predictive model: a case-study in predicting mortality
title_full	Estimating real-world performance of a predictive model: a case-study in predicting mortality
title_fullStr	Estimating real-world performance of a predictive model: a case-study in predicting mortality
title_full_unstemmed	Estimating real-world performance of a predictive model: a case-study in predicting mortality
title_short	Estimating real-world performance of a predictive model: a case-study in predicting mortality
title_sort	estimating real-world performance of a predictive model: a case-study in predicting mortality
topic	Research and Applications
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7382635/ https://www.ncbi.nlm.nih.gov/pubmed/32734165 http://dx.doi.org/10.1093/jamiaopen/ooaa008
work_keys_str_mv	AT majorvincentj estimatingrealworldperformanceofapredictivemodelacasestudyinpredictingmortality AT jethanineil estimatingrealworldperformanceofapredictivemodelacasestudyinpredictingmortality AT aphinyanaphongsyindalon estimatingrealworldperformanceofapredictivemodelacasestudyinpredictingmortality

Estimating real-world performance of a predictive model: a case-study in predicting mortality

Ejemplares similares