Cargando…
Risk stratification for COVID-19 hospitalization: a multivariable model based on gradient-boosting decision trees
BACKGROUND: The COVID-19 pandemic has led to an increased demand for health care resources and, in some cases, shortage of medical equipment and staff. Our objective was to develop and validate a multivariable model to predict risk of hospitalization for patients infected with SARS-CoV-2. METHODS: W...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
CMA Joule Inc.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8695533/ https://www.ncbi.nlm.nih.gov/pubmed/34933880 http://dx.doi.org/10.9778/cmajo.20210036 |
_version_ | 1784619600519888896 |
---|---|
author | Gutierrez, Jahir M. Volkovs, Maksims Poutanen, Tomi Watson, Tristan Rosella, Laura C. |
author_facet | Gutierrez, Jahir M. Volkovs, Maksims Poutanen, Tomi Watson, Tristan Rosella, Laura C. |
author_sort | Gutierrez, Jahir M. |
collection | PubMed |
description | BACKGROUND: The COVID-19 pandemic has led to an increased demand for health care resources and, in some cases, shortage of medical equipment and staff. Our objective was to develop and validate a multivariable model to predict risk of hospitalization for patients infected with SARS-CoV-2. METHODS: We used routinely collected health records in a patient cohort to develop and validate our prediction model. This cohort included adult patients (age ≥ 18 yr) from Ontario, Canada, who tested positive for SARS-CoV-2 ribonucleic acid by polymerase chain reaction between Feb. 2 and Oct. 5, 2020, and were followed up through Nov. 5, 2020. Patients living in long-term care facilities were excluded, as they were all assumed to be at high risk of hospitalization for COVID-19. Risk of hospitalization within 30 days of diagnosis of SARS-CoV-2 infection was estimated via gradient-boosting decision trees, and variable importance examined via Shapley values. We built a gradient-boosting model using the Extreme Gradient Boosting (XGBoost) algorithm and compared its performance against 4 empirical rules commonly used for risk stratifications based on age and number of comorbidities. RESULTS: The cohort included 36 323 patients with 2583 hospitalizations (7.1%). Hospitalized patients had a higher median age (64 yr v. 43 yr), were more likely to be male (56.3% v. 47.3%) and had a higher median number of comorbidities (3, interquartile range [IQR] 2–6 v. 1, IQR 0–3) than nonhospitalized patients. Patients were split into development (n = 29 058, 80.0%) and held-out validation (n = 7265, 20.0%) cohorts. The gradient-boosting model achieved high discrimination (development cohort: area under the receiver operating characteristic curve across the 5 folds of 0.852; validation cohort: 0.8475) and strong calibration (slope = 1.01, intercept = −0.01). The patients who scored at the top 10% captured 47.4% of hospitalizations, and those who scored at the top 30% captured 80.6%. INTERPRETATION: We developed and validated an accurate risk stratification model using routinely collected health administrative data. We envision that modelling such risk stratification based on routinely collected health data could support management of COVID-19 on a population health level. |
format | Online Article Text |
id | pubmed-8695533 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | CMA Joule Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-86955332021-12-24 Risk stratification for COVID-19 hospitalization: a multivariable model based on gradient-boosting decision trees Gutierrez, Jahir M. Volkovs, Maksims Poutanen, Tomi Watson, Tristan Rosella, Laura C. CMAJ Open Research BACKGROUND: The COVID-19 pandemic has led to an increased demand for health care resources and, in some cases, shortage of medical equipment and staff. Our objective was to develop and validate a multivariable model to predict risk of hospitalization for patients infected with SARS-CoV-2. METHODS: We used routinely collected health records in a patient cohort to develop and validate our prediction model. This cohort included adult patients (age ≥ 18 yr) from Ontario, Canada, who tested positive for SARS-CoV-2 ribonucleic acid by polymerase chain reaction between Feb. 2 and Oct. 5, 2020, and were followed up through Nov. 5, 2020. Patients living in long-term care facilities were excluded, as they were all assumed to be at high risk of hospitalization for COVID-19. Risk of hospitalization within 30 days of diagnosis of SARS-CoV-2 infection was estimated via gradient-boosting decision trees, and variable importance examined via Shapley values. We built a gradient-boosting model using the Extreme Gradient Boosting (XGBoost) algorithm and compared its performance against 4 empirical rules commonly used for risk stratifications based on age and number of comorbidities. RESULTS: The cohort included 36 323 patients with 2583 hospitalizations (7.1%). Hospitalized patients had a higher median age (64 yr v. 43 yr), were more likely to be male (56.3% v. 47.3%) and had a higher median number of comorbidities (3, interquartile range [IQR] 2–6 v. 1, IQR 0–3) than nonhospitalized patients. Patients were split into development (n = 29 058, 80.0%) and held-out validation (n = 7265, 20.0%) cohorts. The gradient-boosting model achieved high discrimination (development cohort: area under the receiver operating characteristic curve across the 5 folds of 0.852; validation cohort: 0.8475) and strong calibration (slope = 1.01, intercept = −0.01). The patients who scored at the top 10% captured 47.4% of hospitalizations, and those who scored at the top 30% captured 80.6%. INTERPRETATION: We developed and validated an accurate risk stratification model using routinely collected health administrative data. We envision that modelling such risk stratification based on routinely collected health data could support management of COVID-19 on a population health level. CMA Joule Inc. 2021-12-21 /pmc/articles/PMC8695533/ /pubmed/34933880 http://dx.doi.org/10.9778/cmajo.20210036 Text en © 2021 CMA Joule Inc. or its licensors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY-NC-ND 4.0) licence, which permits use, distribution and reproduction in any medium, provided that the original publication is properly cited, the use is noncommercial (i.e., research or educational use), and no modifications or adaptations are made. See: https://creativecommons.org/licenses/by-nc-nd/4.0/ |
spellingShingle | Research Gutierrez, Jahir M. Volkovs, Maksims Poutanen, Tomi Watson, Tristan Rosella, Laura C. Risk stratification for COVID-19 hospitalization: a multivariable model based on gradient-boosting decision trees |
title | Risk stratification for COVID-19 hospitalization: a multivariable model based on gradient-boosting decision trees |
title_full | Risk stratification for COVID-19 hospitalization: a multivariable model based on gradient-boosting decision trees |
title_fullStr | Risk stratification for COVID-19 hospitalization: a multivariable model based on gradient-boosting decision trees |
title_full_unstemmed | Risk stratification for COVID-19 hospitalization: a multivariable model based on gradient-boosting decision trees |
title_short | Risk stratification for COVID-19 hospitalization: a multivariable model based on gradient-boosting decision trees |
title_sort | risk stratification for covid-19 hospitalization: a multivariable model based on gradient-boosting decision trees |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8695533/ https://www.ncbi.nlm.nih.gov/pubmed/34933880 http://dx.doi.org/10.9778/cmajo.20210036 |
work_keys_str_mv | AT gutierrezjahirm riskstratificationforcovid19hospitalizationamultivariablemodelbasedongradientboostingdecisiontrees AT volkovsmaksims riskstratificationforcovid19hospitalizationamultivariablemodelbasedongradientboostingdecisiontrees AT poutanentomi riskstratificationforcovid19hospitalizationamultivariablemodelbasedongradientboostingdecisiontrees AT watsontristan riskstratificationforcovid19hospitalizationamultivariablemodelbasedongradientboostingdecisiontrees AT rosellalaurac riskstratificationforcovid19hospitalizationamultivariablemodelbasedongradientboostingdecisiontrees |