Cargando…

Uncovering Clinical Risk Factors and Predicting Severe COVID-19 Cases Using UK Biobank Data: Machine Learning Approach

BACKGROUND: COVID-19 is a major public health concern. Given the extent of the pandemic, it is urgent to identify risk factors associated with disease severity. More accurate prediction of those at risk of developing severe infections is of high clinical importance. OBJECTIVE: Based on the UK Bioban...

Descripción completa

Detalles Bibliográficos
Autores principales: Wong, Kenneth Chi-Yin, Xiang, Yong, Yin, Liangying, So, Hon-Cheong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8485986/
https://www.ncbi.nlm.nih.gov/pubmed/34591027
http://dx.doi.org/10.2196/29544
_version_ 1784577644902219776
author Wong, Kenneth Chi-Yin
Xiang, Yong
Yin, Liangying
So, Hon-Cheong
author_facet Wong, Kenneth Chi-Yin
Xiang, Yong
Yin, Liangying
So, Hon-Cheong
author_sort Wong, Kenneth Chi-Yin
collection PubMed
description BACKGROUND: COVID-19 is a major public health concern. Given the extent of the pandemic, it is urgent to identify risk factors associated with disease severity. More accurate prediction of those at risk of developing severe infections is of high clinical importance. OBJECTIVE: Based on the UK Biobank (UKBB), we aimed to build machine learning models to predict the risk of developing severe or fatal infections, and uncover major risk factors involved. METHODS: We first restricted the analysis to infected individuals (n=7846), then performed analysis at a population level, considering those with no known infection as controls (ncontrols=465,728). Hospitalization was used as a proxy for severity. A total of 97 clinical variables (collected prior to the COVID-19 outbreak) covering demographic variables, comorbidities, blood measurements (eg, hematological/liver/renal function/metabolic parameters), anthropometric measures, and other risk factors (eg, smoking/drinking) were included as predictors. We also constructed a simplified (lite) prediction model using 27 covariates that can be more easily obtained (demographic and comorbidity data). XGboost (gradient-boosted trees) was used for prediction and predictive performance was assessed by cross-validation. Variable importance was quantified by Shapley values (ShapVal), permutation importance (PermImp), and accuracy gain. Shapley dependency and interaction plots were used to evaluate the pattern of relationships between risk factors and outcomes. RESULTS: A total of 2386 severe and 477 fatal cases were identified. For analyses within infected individuals (n=7846), our prediction model achieved area under the receiving-operating characteristic curve (AUC–ROC) of 0.723 (95% CI 0.711-0.736) and 0.814 (95% CI 0.791-0.838) for severe and fatal infections, respectively. The top 5 contributing factors (sorted by ShapVal) for severity were age, number of drugs taken (cnt_tx), cystatin C (reflecting renal function), waist-to-hip ratio (WHR), and Townsend deprivation index (TDI). For mortality, the top features were age, testosterone, cnt_tx, waist circumference (WC), and red cell distribution width. For analyses involving the whole UKBB population, AUCs for severity and fatality were 0.696 (95% CI 0.684-0.708) and 0.825 (95% CI 0.802-0.848), respectively. The same top 5 risk factors were identified for both outcomes, namely, age, cnt_tx, WC, WHR, and TDI. Apart from the above, age, cystatin C, TDI, and cnt_tx were among the top 10 across all 4 analyses. Other diseases top ranked by ShapVal or PermImp were type 2 diabetes mellitus (T2DM), coronary artery disease, atrial fibrillation, and dementia, among others. For the “lite” models, predictive performances were broadly similar, with estimated AUCs of 0.716, 0.818, 0.696, and 0.830, respectively. The top ranked variables were similar to above, including age, cnt_tx, WC, sex (male), and T2DM. CONCLUSIONS: We identified numerous baseline clinical risk factors for severe/fatal infection by XGboost. For example, age, central obesity, impaired renal function, multiple comorbidities, and cardiometabolic abnormalities may predispose to poorer outcomes. The prediction models may be useful at a population level to identify those susceptible to developing severe/fatal infections, facilitating targeted prevention strategies. A risk-prediction tool is also available online. Further replications in independent cohorts are required to verify our findings.
format Online
Article
Text
id pubmed-8485986
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-84859862021-10-18 Uncovering Clinical Risk Factors and Predicting Severe COVID-19 Cases Using UK Biobank Data: Machine Learning Approach Wong, Kenneth Chi-Yin Xiang, Yong Yin, Liangying So, Hon-Cheong JMIR Public Health Surveill Original Paper BACKGROUND: COVID-19 is a major public health concern. Given the extent of the pandemic, it is urgent to identify risk factors associated with disease severity. More accurate prediction of those at risk of developing severe infections is of high clinical importance. OBJECTIVE: Based on the UK Biobank (UKBB), we aimed to build machine learning models to predict the risk of developing severe or fatal infections, and uncover major risk factors involved. METHODS: We first restricted the analysis to infected individuals (n=7846), then performed analysis at a population level, considering those with no known infection as controls (ncontrols=465,728). Hospitalization was used as a proxy for severity. A total of 97 clinical variables (collected prior to the COVID-19 outbreak) covering demographic variables, comorbidities, blood measurements (eg, hematological/liver/renal function/metabolic parameters), anthropometric measures, and other risk factors (eg, smoking/drinking) were included as predictors. We also constructed a simplified (lite) prediction model using 27 covariates that can be more easily obtained (demographic and comorbidity data). XGboost (gradient-boosted trees) was used for prediction and predictive performance was assessed by cross-validation. Variable importance was quantified by Shapley values (ShapVal), permutation importance (PermImp), and accuracy gain. Shapley dependency and interaction plots were used to evaluate the pattern of relationships between risk factors and outcomes. RESULTS: A total of 2386 severe and 477 fatal cases were identified. For analyses within infected individuals (n=7846), our prediction model achieved area under the receiving-operating characteristic curve (AUC–ROC) of 0.723 (95% CI 0.711-0.736) and 0.814 (95% CI 0.791-0.838) for severe and fatal infections, respectively. The top 5 contributing factors (sorted by ShapVal) for severity were age, number of drugs taken (cnt_tx), cystatin C (reflecting renal function), waist-to-hip ratio (WHR), and Townsend deprivation index (TDI). For mortality, the top features were age, testosterone, cnt_tx, waist circumference (WC), and red cell distribution width. For analyses involving the whole UKBB population, AUCs for severity and fatality were 0.696 (95% CI 0.684-0.708) and 0.825 (95% CI 0.802-0.848), respectively. The same top 5 risk factors were identified for both outcomes, namely, age, cnt_tx, WC, WHR, and TDI. Apart from the above, age, cystatin C, TDI, and cnt_tx were among the top 10 across all 4 analyses. Other diseases top ranked by ShapVal or PermImp were type 2 diabetes mellitus (T2DM), coronary artery disease, atrial fibrillation, and dementia, among others. For the “lite” models, predictive performances were broadly similar, with estimated AUCs of 0.716, 0.818, 0.696, and 0.830, respectively. The top ranked variables were similar to above, including age, cnt_tx, WC, sex (male), and T2DM. CONCLUSIONS: We identified numerous baseline clinical risk factors for severe/fatal infection by XGboost. For example, age, central obesity, impaired renal function, multiple comorbidities, and cardiometabolic abnormalities may predispose to poorer outcomes. The prediction models may be useful at a population level to identify those susceptible to developing severe/fatal infections, facilitating targeted prevention strategies. A risk-prediction tool is also available online. Further replications in independent cohorts are required to verify our findings. JMIR Publications 2021-09-30 /pmc/articles/PMC8485986/ /pubmed/34591027 http://dx.doi.org/10.2196/29544 Text en ©Kenneth Chi-Yin Wong, Yong Xiang, Liangying Yin, Hon-Cheong So. Originally published in JMIR Public Health and Surveillance (https://publichealth.jmir.org), 30.09.2021. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on https://publichealth.jmir.org, as well as this copyright and license information must be included.
spellingShingle Original Paper
Wong, Kenneth Chi-Yin
Xiang, Yong
Yin, Liangying
So, Hon-Cheong
Uncovering Clinical Risk Factors and Predicting Severe COVID-19 Cases Using UK Biobank Data: Machine Learning Approach
title Uncovering Clinical Risk Factors and Predicting Severe COVID-19 Cases Using UK Biobank Data: Machine Learning Approach
title_full Uncovering Clinical Risk Factors and Predicting Severe COVID-19 Cases Using UK Biobank Data: Machine Learning Approach
title_fullStr Uncovering Clinical Risk Factors and Predicting Severe COVID-19 Cases Using UK Biobank Data: Machine Learning Approach
title_full_unstemmed Uncovering Clinical Risk Factors and Predicting Severe COVID-19 Cases Using UK Biobank Data: Machine Learning Approach
title_short Uncovering Clinical Risk Factors and Predicting Severe COVID-19 Cases Using UK Biobank Data: Machine Learning Approach
title_sort uncovering clinical risk factors and predicting severe covid-19 cases using uk biobank data: machine learning approach
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8485986/
https://www.ncbi.nlm.nih.gov/pubmed/34591027
http://dx.doi.org/10.2196/29544
work_keys_str_mv AT wongkennethchiyin uncoveringclinicalriskfactorsandpredictingseverecovid19casesusingukbiobankdatamachinelearningapproach
AT xiangyong uncoveringclinicalriskfactorsandpredictingseverecovid19casesusingukbiobankdatamachinelearningapproach
AT yinliangying uncoveringclinicalriskfactorsandpredictingseverecovid19casesusingukbiobankdatamachinelearningapproach
AT sohoncheong uncoveringclinicalriskfactorsandpredictingseverecovid19casesusingukbiobankdatamachinelearningapproach