Cargando…
Development and validation of a machine learning model to predict venous thromboembolism among hospitalized cancer patients
OBJECTIVE: Hospitalized cancer patients are at high risk of venous thromboembolism (VTE). However, no predictive model has been specifically developed for this population. Machine learning (ML) is advantageous for model development. This study was aimed at developing predictive models using three di...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9583033/ https://www.ncbi.nlm.nih.gov/pubmed/36276886 http://dx.doi.org/10.1016/j.apjon.2022.100128 |
Sumario: | OBJECTIVE: Hospitalized cancer patients are at high risk of venous thromboembolism (VTE). However, no predictive model has been specifically developed for this population. Machine learning (ML) is advantageous for model development. This study was aimed at developing predictive models using three different ML algorithms and logistic regression for VTE risk among hospitalized cancer patients and comparing their predictive performance. METHODS: A retrospective case–control study was conducted on hospitalized cancer patients at Hunan Cancer Hospital, China, between October 1, 2021, and February 30, 2022. Patients diagnosed with vein thrombosis before or after admission were excluded. Patient, tumor, treatment, and laboratory indicator information was obtained from the hospital information system. The data were randomly split into distributions of 80% for training and 20% for testing. Logistic regression and three ML algorithms—the support vector machine, random forest, and extreme gradient boosting (XGBoost)—were used to develop the models. Model performance was compared using F1, G-mean, area under the receiver operating characteristic curve (AUROC), accuracy, precision, recall rate, and specificity. Feature rankings were achieved based on the permutation scores of the selected features in the optimal model. RESULTS: A total of 1100 patients (mean [SD] age, 54.75 [11.08] years; 485 [44.09%] male) were included in this study. There were 340 patients (30.9%) in the VTE group. The XGBoost model achieved the best performance with the following evaluation metrics: F1 (0.750), G-mean (0.816), AUROC (0.818), accuracy (0.845), precision (0.750), recall rate (0.750), and specificity (0.888). D-dimer level, diabetes, hypertension, pleural metastasis, and hematological malignancies were identified as the five most significant features of the XGBoost model. CONCLUSIONS: Four predictive models were developed using ML algorithms. The XGBoost model was the optimal predictive model compared with the other three models. This study indicates that ML may play an important role in VTE risk estimation among hospitalized patients with cancer and provides a reference for thromboprophylaxis. |
---|