Cargando…

A dynamic machine learning model for prediction of NAFLD in a health checkup population: A longitudinal study

BACKGROUND: Non-alcoholic fatty liver disease (NAFLD) is one of the most common liver diseases worldwide. Currently, most NAFLD prediction models are diagnostic models based on cross-sectional data, which failed to provide early identification or clarify causal relationships. We aimed to use time-se...

Descripción completa

Detalles Bibliográficos
Autores principales:	Deng, Yuhan, Ma, Yuan, Fu, Jingzhu, Wang, Xiaona, Yu, Canqing, Lv, Jun, Man, Sailimai, Wang, Bo, Li, Liming
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2023
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10412833/ https://www.ncbi.nlm.nih.gov/pubmed/37576311 http://dx.doi.org/10.1016/j.heliyon.2023.e18758

_version_	1785087001551175680
author	Deng, Yuhan Ma, Yuan Fu, Jingzhu Wang, Xiaona Yu, Canqing Lv, Jun Man, Sailimai Wang, Bo Li, Liming
author_facet	Deng, Yuhan Ma, Yuan Fu, Jingzhu Wang, Xiaona Yu, Canqing Lv, Jun Man, Sailimai Wang, Bo Li, Liming
author_sort	Deng, Yuhan
collection	PubMed
description	BACKGROUND: Non-alcoholic fatty liver disease (NAFLD) is one of the most common liver diseases worldwide. Currently, most NAFLD prediction models are diagnostic models based on cross-sectional data, which failed to provide early identification or clarify causal relationships. We aimed to use time-series deep learning models with longitudinal health checkup records to predict the onset of NAFLD in the future, and update the model stepwise by incorporating new checkup records to achieve dynamic prediction. METHODS: 10,493 participants with over 6 health checkup records from Beijing MJ Health Screening Center were included to conduct a retrospective cohort study, in which the constantly updated initial 5 checkup data were incorporated stepwise to predict the risk of NAFLD at and after their sixth health checkups. A total of 33 variables were considered, consisting of demographic characteristics, medical history, lifestyle, physical examinations, and laboratory tests. L1-penalized logistic regression (LR) was used for feature selection. The long short-term memory (LSTM) algorithm was introduced for model development, and five-fold cross-validation was conducted to tune and choose optimal hyperparameters. Both internal validation and external validation were conducted, using the 20% randomly divided holdout test dataset and previously unseen data from Shanghai MJ Health Screening Center, respectively, to evaluate model performance. The evaluation metrics included area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, Brier score, and decision curve. Bootstrap sampling was implemented to generate 95% confidence intervals of all the metrics. Finally, the Shapley additive explanations (SHAP) algorithm was applied in the holdout test dataset for model interpretability to obtain time-specific and sample-specific contributions of each feature. RESULTS: Among the 10,493 participants, 1662 (15.84%) were diagnosed with NAFLD at and after their sixth health checkups. The predictive performance of the deep learning model in the internal validation dataset improved over the incorporation of the checkups, with AUROC increasing from 0.729 (95% CI: 0.698,0.760) at baseline to 0.818 (95% CI: 0.798,0.844) when consecutive 5 checkups were included. The external validation dataset, containing 1728 participants, was used to verify the results, in which AUROC increased from 0.700 (95% CI: 0.657,0.740) with only the first checkups to 0.792 (95% CI: 0.758,0.825) with all five. The results of feature significance showed that body fat percentage, alanine transaminase (ALT), and uric acid owned the greatest impact on the outcome, time-specific, individual-specific and dynamic feature contributions were also produced for model interpretability. CONCLUSION: A dynamic prediction model was successfully established in our study, and the prediction capability kept improving with the renewal of the latest checkup records. In addition, we identified key features associated with the onset of NAFLD, making it possible to optimize the prevention and control strategies of the disease in the general population.
format	Online Article Text
id	pubmed-10412833
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-104128332023-08-11 A dynamic machine learning model for prediction of NAFLD in a health checkup population: A longitudinal study Deng, Yuhan Ma, Yuan Fu, Jingzhu Wang, Xiaona Yu, Canqing Lv, Jun Man, Sailimai Wang, Bo Li, Liming Heliyon Research Article BACKGROUND: Non-alcoholic fatty liver disease (NAFLD) is one of the most common liver diseases worldwide. Currently, most NAFLD prediction models are diagnostic models based on cross-sectional data, which failed to provide early identification or clarify causal relationships. We aimed to use time-series deep learning models with longitudinal health checkup records to predict the onset of NAFLD in the future, and update the model stepwise by incorporating new checkup records to achieve dynamic prediction. METHODS: 10,493 participants with over 6 health checkup records from Beijing MJ Health Screening Center were included to conduct a retrospective cohort study, in which the constantly updated initial 5 checkup data were incorporated stepwise to predict the risk of NAFLD at and after their sixth health checkups. A total of 33 variables were considered, consisting of demographic characteristics, medical history, lifestyle, physical examinations, and laboratory tests. L1-penalized logistic regression (LR) was used for feature selection. The long short-term memory (LSTM) algorithm was introduced for model development, and five-fold cross-validation was conducted to tune and choose optimal hyperparameters. Both internal validation and external validation were conducted, using the 20% randomly divided holdout test dataset and previously unseen data from Shanghai MJ Health Screening Center, respectively, to evaluate model performance. The evaluation metrics included area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, Brier score, and decision curve. Bootstrap sampling was implemented to generate 95% confidence intervals of all the metrics. Finally, the Shapley additive explanations (SHAP) algorithm was applied in the holdout test dataset for model interpretability to obtain time-specific and sample-specific contributions of each feature. RESULTS: Among the 10,493 participants, 1662 (15.84%) were diagnosed with NAFLD at and after their sixth health checkups. The predictive performance of the deep learning model in the internal validation dataset improved over the incorporation of the checkups, with AUROC increasing from 0.729 (95% CI: 0.698,0.760) at baseline to 0.818 (95% CI: 0.798,0.844) when consecutive 5 checkups were included. The external validation dataset, containing 1728 participants, was used to verify the results, in which AUROC increased from 0.700 (95% CI: 0.657,0.740) with only the first checkups to 0.792 (95% CI: 0.758,0.825) with all five. The results of feature significance showed that body fat percentage, alanine transaminase (ALT), and uric acid owned the greatest impact on the outcome, time-specific, individual-specific and dynamic feature contributions were also produced for model interpretability. CONCLUSION: A dynamic prediction model was successfully established in our study, and the prediction capability kept improving with the renewal of the latest checkup records. In addition, we identified key features associated with the onset of NAFLD, making it possible to optimize the prevention and control strategies of the disease in the general population. Elsevier 2023-07-27 /pmc/articles/PMC10412833/ /pubmed/37576311 http://dx.doi.org/10.1016/j.heliyon.2023.e18758 Text en © 2023 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle	Research Article Deng, Yuhan Ma, Yuan Fu, Jingzhu Wang, Xiaona Yu, Canqing Lv, Jun Man, Sailimai Wang, Bo Li, Liming A dynamic machine learning model for prediction of NAFLD in a health checkup population: A longitudinal study
title	A dynamic machine learning model for prediction of NAFLD in a health checkup population: A longitudinal study
title_full	A dynamic machine learning model for prediction of NAFLD in a health checkup population: A longitudinal study
title_fullStr	A dynamic machine learning model for prediction of NAFLD in a health checkup population: A longitudinal study
title_full_unstemmed	A dynamic machine learning model for prediction of NAFLD in a health checkup population: A longitudinal study
title_short	A dynamic machine learning model for prediction of NAFLD in a health checkup population: A longitudinal study
title_sort	dynamic machine learning model for prediction of nafld in a health checkup population: a longitudinal study
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10412833/ https://www.ncbi.nlm.nih.gov/pubmed/37576311 http://dx.doi.org/10.1016/j.heliyon.2023.e18758
work_keys_str_mv	AT dengyuhan adynamicmachinelearningmodelforpredictionofnafldinahealthcheckuppopulationalongitudinalstudy AT mayuan adynamicmachinelearningmodelforpredictionofnafldinahealthcheckuppopulationalongitudinalstudy AT fujingzhu adynamicmachinelearningmodelforpredictionofnafldinahealthcheckuppopulationalongitudinalstudy AT wangxiaona adynamicmachinelearningmodelforpredictionofnafldinahealthcheckuppopulationalongitudinalstudy AT yucanqing adynamicmachinelearningmodelforpredictionofnafldinahealthcheckuppopulationalongitudinalstudy AT lvjun adynamicmachinelearningmodelforpredictionofnafldinahealthcheckuppopulationalongitudinalstudy AT mansailimai adynamicmachinelearningmodelforpredictionofnafldinahealthcheckuppopulationalongitudinalstudy AT wangbo adynamicmachinelearningmodelforpredictionofnafldinahealthcheckuppopulationalongitudinalstudy AT liliming adynamicmachinelearningmodelforpredictionofnafldinahealthcheckuppopulationalongitudinalstudy AT dengyuhan dynamicmachinelearningmodelforpredictionofnafldinahealthcheckuppopulationalongitudinalstudy AT mayuan dynamicmachinelearningmodelforpredictionofnafldinahealthcheckuppopulationalongitudinalstudy AT fujingzhu dynamicmachinelearningmodelforpredictionofnafldinahealthcheckuppopulationalongitudinalstudy AT wangxiaona dynamicmachinelearningmodelforpredictionofnafldinahealthcheckuppopulationalongitudinalstudy AT yucanqing dynamicmachinelearningmodelforpredictionofnafldinahealthcheckuppopulationalongitudinalstudy AT lvjun dynamicmachinelearningmodelforpredictionofnafldinahealthcheckuppopulationalongitudinalstudy AT mansailimai dynamicmachinelearningmodelforpredictionofnafldinahealthcheckuppopulationalongitudinalstudy AT wangbo dynamicmachinelearningmodelforpredictionofnafldinahealthcheckuppopulationalongitudinalstudy AT liliming dynamicmachinelearningmodelforpredictionofnafldinahealthcheckuppopulationalongitudinalstudy

A dynamic machine learning model for prediction of NAFLD in a health checkup population: A longitudinal study

Ejemplares similares