Cargando…

Generalizability challenges of mortality risk prediction models: A retrospective analysis on a multi-center database

Modern predictive models require large amounts of data for training and evaluation, absence of which may result in models that are specific to certain locations, populations in them and clinical practices. Yet, best practices for clinical risk prediction models have not yet considered such challenge...

Descripción completa

Detalles Bibliográficos
Autores principales:	Singh, Harvineet, Mhasawade, Vishwali, Chunara, Rumi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9931319/ https://www.ncbi.nlm.nih.gov/pubmed/36812510 http://dx.doi.org/10.1371/journal.pdig.0000023

_version_	1784889223413760000
author	Singh, Harvineet Mhasawade, Vishwali Chunara, Rumi
author_facet	Singh, Harvineet Mhasawade, Vishwali Chunara, Rumi
author_sort	Singh, Harvineet
collection	PubMed
description	Modern predictive models require large amounts of data for training and evaluation, absence of which may result in models that are specific to certain locations, populations in them and clinical practices. Yet, best practices for clinical risk prediction models have not yet considered such challenges to generalizability. Here we ask whether population- and group-level performance of mortality prediction models vary significantly when applied to hospitals or geographies different from the ones in which they are developed. Further, what characteristics of the datasets explain the performance variation? In this multi-center cross-sectional study, we analyzed electronic health records from 179 hospitals across the US with 70,126 hospitalizations from 2014 to 2015. Generalization gap, defined as difference between model performance metrics across hospitals, is computed for area under the receiver operating characteristic curve (AUC) and calibration slope. To assess model performance by the race variable, we report differences in false negative rates across groups. Data were also analyzed using a causal discovery algorithm “Fast Causal Inference” that infers paths of causal influence while identifying potential influences associated with unmeasured variables. When transferring models across hospitals, AUC at the test hospital ranged from 0.777 to 0.832 (1st-3rd quartile or IQR; median 0.801); calibration slope from 0.725 to 0.983 (IQR; median 0.853); and disparity in false negative rates from 0.046 to 0.168 (IQR; median 0.092). Distribution of all variable types (demography, vitals, and labs) differed significantly across hospitals and regions. The race variable also mediated differences in the relationship between clinical variables and mortality, by hospital/region. In conclusion, group-level performance should be assessed during generalizability checks to identify potential harms to the groups. Moreover, for developing methods to improve model performance in new environments, a better understanding and documentation of provenance of data and health processes are needed to identify and mitigate sources of variation.
format	Online Article Text
id	pubmed-9931319
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-99313192023-02-16 Generalizability challenges of mortality risk prediction models: A retrospective analysis on a multi-center database Singh, Harvineet Mhasawade, Vishwali Chunara, Rumi PLOS Digit Health Research Article Modern predictive models require large amounts of data for training and evaluation, absence of which may result in models that are specific to certain locations, populations in them and clinical practices. Yet, best practices for clinical risk prediction models have not yet considered such challenges to generalizability. Here we ask whether population- and group-level performance of mortality prediction models vary significantly when applied to hospitals or geographies different from the ones in which they are developed. Further, what characteristics of the datasets explain the performance variation? In this multi-center cross-sectional study, we analyzed electronic health records from 179 hospitals across the US with 70,126 hospitalizations from 2014 to 2015. Generalization gap, defined as difference between model performance metrics across hospitals, is computed for area under the receiver operating characteristic curve (AUC) and calibration slope. To assess model performance by the race variable, we report differences in false negative rates across groups. Data were also analyzed using a causal discovery algorithm “Fast Causal Inference” that infers paths of causal influence while identifying potential influences associated with unmeasured variables. When transferring models across hospitals, AUC at the test hospital ranged from 0.777 to 0.832 (1st-3rd quartile or IQR; median 0.801); calibration slope from 0.725 to 0.983 (IQR; median 0.853); and disparity in false negative rates from 0.046 to 0.168 (IQR; median 0.092). Distribution of all variable types (demography, vitals, and labs) differed significantly across hospitals and regions. The race variable also mediated differences in the relationship between clinical variables and mortality, by hospital/region. In conclusion, group-level performance should be assessed during generalizability checks to identify potential harms to the groups. Moreover, for developing methods to improve model performance in new environments, a better understanding and documentation of provenance of data and health processes are needed to identify and mitigate sources of variation. Public Library of Science 2022-04-05 /pmc/articles/PMC9931319/ /pubmed/36812510 http://dx.doi.org/10.1371/journal.pdig.0000023 Text en © 2022 Singh et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Singh, Harvineet Mhasawade, Vishwali Chunara, Rumi Generalizability challenges of mortality risk prediction models: A retrospective analysis on a multi-center database
title	Generalizability challenges of mortality risk prediction models: A retrospective analysis on a multi-center database
title_full	Generalizability challenges of mortality risk prediction models: A retrospective analysis on a multi-center database
title_fullStr	Generalizability challenges of mortality risk prediction models: A retrospective analysis on a multi-center database
title_full_unstemmed	Generalizability challenges of mortality risk prediction models: A retrospective analysis on a multi-center database
title_short	Generalizability challenges of mortality risk prediction models: A retrospective analysis on a multi-center database
title_sort	generalizability challenges of mortality risk prediction models: a retrospective analysis on a multi-center database
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9931319/ https://www.ncbi.nlm.nih.gov/pubmed/36812510 http://dx.doi.org/10.1371/journal.pdig.0000023
work_keys_str_mv	AT singhharvineet generalizabilitychallengesofmortalityriskpredictionmodelsaretrospectiveanalysisonamulticenterdatabase AT mhasawadevishwali generalizabilitychallengesofmortalityriskpredictionmodelsaretrospectiveanalysisonamulticenterdatabase AT chunararumi generalizabilitychallengesofmortalityriskpredictionmodelsaretrospectiveanalysisonamulticenterdatabase

Generalizability challenges of mortality risk prediction models: A retrospective analysis on a multi-center database

Ejemplares similares