Cargando…

Classification of imbalanced data using machine learning algorithms to predict the risk of renal graft failures in Ethiopia

INTRODUCTION: The prevalence of end-stage renal disease has raised the need for renal replacement therapy over recent decades. Even though a kidney transplant offers an improved quality of life and lower cost of care than dialysis, graft failure is possible after transplantation. Hence, this study a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mulugeta, Getahun, Zewotir, Temesgen, Tegegne, Awoke Seyoum, Juhar, Leja Hamza, Muleta, Mahteme Bekele
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2023
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10201495/ https://www.ncbi.nlm.nih.gov/pubmed/37217892 http://dx.doi.org/10.1186/s12911-023-02185-5

_version_	1785045275289583616
author	Mulugeta, Getahun Zewotir, Temesgen Tegegne, Awoke Seyoum Juhar, Leja Hamza Muleta, Mahteme Bekele
author_facet	Mulugeta, Getahun Zewotir, Temesgen Tegegne, Awoke Seyoum Juhar, Leja Hamza Muleta, Mahteme Bekele
author_sort	Mulugeta, Getahun
collection	PubMed
description	INTRODUCTION: The prevalence of end-stage renal disease has raised the need for renal replacement therapy over recent decades. Even though a kidney transplant offers an improved quality of life and lower cost of care than dialysis, graft failure is possible after transplantation. Hence, this study aimed to predict the risk of graft failure among post-transplant recipients in Ethiopia using the selected machine learning prediction models. METHODOLOGY: The data was extracted from the retrospective cohort of kidney transplant recipients at the Ethiopian National Kidney Transplantation Center from September 2015 to February 2022. In response to the imbalanced nature of the data, we performed hyperparameter tuning, probability threshold moving, tree-based ensemble learning, stacking ensemble learning, and probability calibrations to improve the prediction results. Merit-based selected probabilistic (logistic regression, naive Bayes, and artificial neural network) and tree-based ensemble (random forest, bagged tree, and stochastic gradient boosting) models were applied. Model comparison was performed in terms of discrimination and calibration performance. The best-performing model was then used to predict the risk of graft failure. RESULTS: A total of 278 completed cases were analyzed, with 21 graft failures and 3 events per predictor. Of these, 74.8% are male, and 25.2% are female, with a median age of 37. From the comparison of models at the individual level, the bagged tree and random forest have top and equal discrimination performance (AUC-ROC = 0.84). In contrast, the random forest has the best calibration performance (brier score = 0.045). Under testing the individual model as a meta-learner for stacking ensemble learning, the result of stochastic gradient boosting as a meta-learner has the top discrimination (AUC-ROC = 0.88) and calibration (brier score = 0.048) performance. Regarding feature importance, chronic rejection, blood urea nitrogen, number of post-transplant admissions, phosphorus level, acute rejection, and urological complications are the top predictors of graft failure. CONCLUSIONS: Bagging, boosting, and stacking, with probability calibration, are good choices for clinical risk predictions working on imbalanced data. The data-driven probability threshold is more beneficial than the natural threshold of 0.5 to improve the prediction result from imbalanced data. Integrating various techniques in a systematic framework is a smart strategy to improve prediction results from imbalanced data. It is recommended for clinical experts in kidney transplantation to use the final calibrated model as a decision support system to predict the risk of graft failure for individual patients.
format	Online Article Text
id	pubmed-10201495
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-102014952023-05-23 Classification of imbalanced data using machine learning algorithms to predict the risk of renal graft failures in Ethiopia Mulugeta, Getahun Zewotir, Temesgen Tegegne, Awoke Seyoum Juhar, Leja Hamza Muleta, Mahteme Bekele BMC Med Inform Decis Mak Research INTRODUCTION: The prevalence of end-stage renal disease has raised the need for renal replacement therapy over recent decades. Even though a kidney transplant offers an improved quality of life and lower cost of care than dialysis, graft failure is possible after transplantation. Hence, this study aimed to predict the risk of graft failure among post-transplant recipients in Ethiopia using the selected machine learning prediction models. METHODOLOGY: The data was extracted from the retrospective cohort of kidney transplant recipients at the Ethiopian National Kidney Transplantation Center from September 2015 to February 2022. In response to the imbalanced nature of the data, we performed hyperparameter tuning, probability threshold moving, tree-based ensemble learning, stacking ensemble learning, and probability calibrations to improve the prediction results. Merit-based selected probabilistic (logistic regression, naive Bayes, and artificial neural network) and tree-based ensemble (random forest, bagged tree, and stochastic gradient boosting) models were applied. Model comparison was performed in terms of discrimination and calibration performance. The best-performing model was then used to predict the risk of graft failure. RESULTS: A total of 278 completed cases were analyzed, with 21 graft failures and 3 events per predictor. Of these, 74.8% are male, and 25.2% are female, with a median age of 37. From the comparison of models at the individual level, the bagged tree and random forest have top and equal discrimination performance (AUC-ROC = 0.84). In contrast, the random forest has the best calibration performance (brier score = 0.045). Under testing the individual model as a meta-learner for stacking ensemble learning, the result of stochastic gradient boosting as a meta-learner has the top discrimination (AUC-ROC = 0.88) and calibration (brier score = 0.048) performance. Regarding feature importance, chronic rejection, blood urea nitrogen, number of post-transplant admissions, phosphorus level, acute rejection, and urological complications are the top predictors of graft failure. CONCLUSIONS: Bagging, boosting, and stacking, with probability calibration, are good choices for clinical risk predictions working on imbalanced data. The data-driven probability threshold is more beneficial than the natural threshold of 0.5 to improve the prediction result from imbalanced data. Integrating various techniques in a systematic framework is a smart strategy to improve prediction results from imbalanced data. It is recommended for clinical experts in kidney transplantation to use the final calibrated model as a decision support system to predict the risk of graft failure for individual patients. BioMed Central 2023-05-22 /pmc/articles/PMC10201495/ /pubmed/37217892 http://dx.doi.org/10.1186/s12911-023-02185-5 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Mulugeta, Getahun Zewotir, Temesgen Tegegne, Awoke Seyoum Juhar, Leja Hamza Muleta, Mahteme Bekele Classification of imbalanced data using machine learning algorithms to predict the risk of renal graft failures in Ethiopia
title	Classification of imbalanced data using machine learning algorithms to predict the risk of renal graft failures in Ethiopia
title_full	Classification of imbalanced data using machine learning algorithms to predict the risk of renal graft failures in Ethiopia
title_fullStr	Classification of imbalanced data using machine learning algorithms to predict the risk of renal graft failures in Ethiopia
title_full_unstemmed	Classification of imbalanced data using machine learning algorithms to predict the risk of renal graft failures in Ethiopia
title_short	Classification of imbalanced data using machine learning algorithms to predict the risk of renal graft failures in Ethiopia
title_sort	classification of imbalanced data using machine learning algorithms to predict the risk of renal graft failures in ethiopia
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10201495/ https://www.ncbi.nlm.nih.gov/pubmed/37217892 http://dx.doi.org/10.1186/s12911-023-02185-5
work_keys_str_mv	AT mulugetagetahun classificationofimbalanceddatausingmachinelearningalgorithmstopredicttheriskofrenalgraftfailuresinethiopia AT zewotirtemesgen classificationofimbalanceddatausingmachinelearningalgorithmstopredicttheriskofrenalgraftfailuresinethiopia AT tegegneawokeseyoum classificationofimbalanceddatausingmachinelearningalgorithmstopredicttheriskofrenalgraftfailuresinethiopia AT juharlejahamza classificationofimbalanceddatausingmachinelearningalgorithmstopredicttheriskofrenalgraftfailuresinethiopia AT muletamahtemebekele classificationofimbalanceddatausingmachinelearningalgorithmstopredicttheriskofrenalgraftfailuresinethiopia

Classification of imbalanced data using machine learning algorithms to predict the risk of renal graft failures in Ethiopia

Ejemplares similares