Cargando…

Development and validation of predictive models for unplanned hospitalization in the Basque Country: analyzing the variability of non-deterministic algorithms

BACKGROUND: The progressive ageing in developed countries entails an increase in multimorbidity. Population-wide predictive models for adverse health outcomes are crucial to address these growing healthcare needs. The main objective of this study is to develop and validate a population-based prognos...

Descripción completa

Detalles Bibliográficos
Autores principales: Olza, Alexander, Millán, Eduardo, Rodríguez-Álvarez, María Xosé
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10403913/
https://www.ncbi.nlm.nih.gov/pubmed/37543596
http://dx.doi.org/10.1186/s12911-023-02226-z
_version_ 1785085178625916928
author Olza, Alexander
Millán, Eduardo
Rodríguez-Álvarez, María Xosé
author_facet Olza, Alexander
Millán, Eduardo
Rodríguez-Álvarez, María Xosé
author_sort Olza, Alexander
collection PubMed
description BACKGROUND: The progressive ageing in developed countries entails an increase in multimorbidity. Population-wide predictive models for adverse health outcomes are crucial to address these growing healthcare needs. The main objective of this study is to develop and validate a population-based prognostic model to predict the probability of unplanned hospitalization in the Basque Country, through comparing the performance of a logistic regression model and three families of machine learning models. METHODS: Using age, sex, diagnoses and drug prescriptions previously transformed by the Johns Hopkins Adjusted Clinical Groups (ACG) System, we predict the probability of unplanned hospitalization in the Basque Country (2.2 million inhabitants) using several techniques. When dealing with non-deterministic algorithms, comparing a single model per technique is not enough to choose the best approach. Thus, we conduct 40 experiments per family of models - Random Forest, Gradient Boosting Decision Trees and Multilayer Perceptrons - and compare them to Logistic Regression. Models’ performance are compared both population-wide and for the 20,000 patients with the highest predicted probabilities, as a hypothetical high-risk group to intervene on. RESULTS: The best-performing technique is Multilayer Perceptron, followed by Gradient Boosting Decision Trees, Logistic Regression and Random Forest. Multilayer Perceptrons also have the lowest variability, around an order of magnitude less than Random Forests. Median area under the ROC curve, average precision and positive predictive value range from 0.789 to 0.802, 0.237 to 0.257 and 0.485 to 0.511, respectively. For Brier Score the median values are 0.048 for all techniques. There is some overlap between the algorithms. For instance, Gradient Boosting Decision Trees perform better than Logistic Regression more than 75% of the time, but not always. CONCLUSIONS: All models have good global performance. The only family that is consistently superior to Logistic Regression is Multilayer Perceptron, showing a very reliable performance with the lowest variability. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-023-02226-z.
format Online
Article
Text
id pubmed-10403913
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-104039132023-08-06 Development and validation of predictive models for unplanned hospitalization in the Basque Country: analyzing the variability of non-deterministic algorithms Olza, Alexander Millán, Eduardo Rodríguez-Álvarez, María Xosé BMC Med Inform Decis Mak Research BACKGROUND: The progressive ageing in developed countries entails an increase in multimorbidity. Population-wide predictive models for adverse health outcomes are crucial to address these growing healthcare needs. The main objective of this study is to develop and validate a population-based prognostic model to predict the probability of unplanned hospitalization in the Basque Country, through comparing the performance of a logistic regression model and three families of machine learning models. METHODS: Using age, sex, diagnoses and drug prescriptions previously transformed by the Johns Hopkins Adjusted Clinical Groups (ACG) System, we predict the probability of unplanned hospitalization in the Basque Country (2.2 million inhabitants) using several techniques. When dealing with non-deterministic algorithms, comparing a single model per technique is not enough to choose the best approach. Thus, we conduct 40 experiments per family of models - Random Forest, Gradient Boosting Decision Trees and Multilayer Perceptrons - and compare them to Logistic Regression. Models’ performance are compared both population-wide and for the 20,000 patients with the highest predicted probabilities, as a hypothetical high-risk group to intervene on. RESULTS: The best-performing technique is Multilayer Perceptron, followed by Gradient Boosting Decision Trees, Logistic Regression and Random Forest. Multilayer Perceptrons also have the lowest variability, around an order of magnitude less than Random Forests. Median area under the ROC curve, average precision and positive predictive value range from 0.789 to 0.802, 0.237 to 0.257 and 0.485 to 0.511, respectively. For Brier Score the median values are 0.048 for all techniques. There is some overlap between the algorithms. For instance, Gradient Boosting Decision Trees perform better than Logistic Regression more than 75% of the time, but not always. CONCLUSIONS: All models have good global performance. The only family that is consistently superior to Logistic Regression is Multilayer Perceptron, showing a very reliable performance with the lowest variability. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-023-02226-z. BioMed Central 2023-08-05 /pmc/articles/PMC10403913/ /pubmed/37543596 http://dx.doi.org/10.1186/s12911-023-02226-z Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Olza, Alexander
Millán, Eduardo
Rodríguez-Álvarez, María Xosé
Development and validation of predictive models for unplanned hospitalization in the Basque Country: analyzing the variability of non-deterministic algorithms
title Development and validation of predictive models for unplanned hospitalization in the Basque Country: analyzing the variability of non-deterministic algorithms
title_full Development and validation of predictive models for unplanned hospitalization in the Basque Country: analyzing the variability of non-deterministic algorithms
title_fullStr Development and validation of predictive models for unplanned hospitalization in the Basque Country: analyzing the variability of non-deterministic algorithms
title_full_unstemmed Development and validation of predictive models for unplanned hospitalization in the Basque Country: analyzing the variability of non-deterministic algorithms
title_short Development and validation of predictive models for unplanned hospitalization in the Basque Country: analyzing the variability of non-deterministic algorithms
title_sort development and validation of predictive models for unplanned hospitalization in the basque country: analyzing the variability of non-deterministic algorithms
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10403913/
https://www.ncbi.nlm.nih.gov/pubmed/37543596
http://dx.doi.org/10.1186/s12911-023-02226-z
work_keys_str_mv AT olzaalexander developmentandvalidationofpredictivemodelsforunplannedhospitalizationinthebasquecountryanalyzingthevariabilityofnondeterministicalgorithms
AT millaneduardo developmentandvalidationofpredictivemodelsforunplannedhospitalizationinthebasquecountryanalyzingthevariabilityofnondeterministicalgorithms
AT rodriguezalvarezmariaxose developmentandvalidationofpredictivemodelsforunplannedhospitalizationinthebasquecountryanalyzingthevariabilityofnondeterministicalgorithms