Cargando…

Development and validation of explainable machine-learning models for carotid atherosclerosis early screening

BACKGROUND: Carotid atherosclerosis (CAS), an important factor in the development of stroke, is a major public health concern. The aim of this study was to establish and validate machine learning (ML) models for early screening of CAS using routine health check-up indicators in northeast China. METH...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yun, Ke, He, Tao, Zhen, Shi, Quan, Meihui, Yang, Xiaotao, Man, Dongliang, Zhang, Shuang, Wang, Wei, Han, Xiaoxu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2023
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10225282/ https://www.ncbi.nlm.nih.gov/pubmed/37246225 http://dx.doi.org/10.1186/s12967-023-04093-8

_version_	1785050367685296128
author	Yun, Ke He, Tao Zhen, Shi Quan, Meihui Yang, Xiaotao Man, Dongliang Zhang, Shuang Wang, Wei Han, Xiaoxu
author_facet	Yun, Ke He, Tao Zhen, Shi Quan, Meihui Yang, Xiaotao Man, Dongliang Zhang, Shuang Wang, Wei Han, Xiaoxu
author_sort	Yun, Ke
collection	PubMed
description	BACKGROUND: Carotid atherosclerosis (CAS), an important factor in the development of stroke, is a major public health concern. The aim of this study was to establish and validate machine learning (ML) models for early screening of CAS using routine health check-up indicators in northeast China. METHODS: A total of 69,601 health check-up records from the health examination center of the First Hospital of China Medical University (Shenyang, China) were collected between 2018 and 2019. For the 2019 records, 80% were assigned to the training set and 20% to the testing set. The 2018 records were used as the external validation dataset. Ten ML algorithms, including decision tree (DT), K-nearest neighbors (KNN), logistic regression (LR), naive Bayes (NB), random forest (RF), multiplayer perceptron (MLP), extreme gradient boosting machine (XGB), gradient boosting decision tree (GBDT), linear support vector machine (SVM-linear), and non-linear support vector machine (SVM-nonlinear), were used to construct CAS screening models. The area under the receiver operating characteristic curve (auROC) and precision-recall curve (auPR) were used as measures of model performance. The SHapley Additive exPlanations (SHAP) method was used to demonstrate the interpretability of the optimal model. RESULTS: A total of 6315 records of patients undergoing carotid ultrasonography were collected; of these, 1632, 407, and 1141 patients were diagnosed with CAS in the training, internal validation, and external validation datasets, respectively. The GBDT model achieved the highest performance metrics with auROC of 0.860 (95% CI 0.839–0.880) in the internal validation dataset and 0.851 (95% CI 0.837–0.863) in the external validation dataset. Individuals with diabetes or those over 65 years of age showed low negative predictive value. In the interpretability analysis, age was the most important factor influencing the performance of the GBDT model, followed by sex and non-high-density lipoprotein cholesterol. CONCLUSIONS: The ML models developed could provide good performance for CAS identification using routine health check-up indicators and could hopefully be applied in scenarios without ethnic and geographic heterogeneity for CAS prevention.
format	Online Article Text
id	pubmed-10225282
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-102252822023-05-30 Development and validation of explainable machine-learning models for carotid atherosclerosis early screening Yun, Ke He, Tao Zhen, Shi Quan, Meihui Yang, Xiaotao Man, Dongliang Zhang, Shuang Wang, Wei Han, Xiaoxu J Transl Med Research BACKGROUND: Carotid atherosclerosis (CAS), an important factor in the development of stroke, is a major public health concern. The aim of this study was to establish and validate machine learning (ML) models for early screening of CAS using routine health check-up indicators in northeast China. METHODS: A total of 69,601 health check-up records from the health examination center of the First Hospital of China Medical University (Shenyang, China) were collected between 2018 and 2019. For the 2019 records, 80% were assigned to the training set and 20% to the testing set. The 2018 records were used as the external validation dataset. Ten ML algorithms, including decision tree (DT), K-nearest neighbors (KNN), logistic regression (LR), naive Bayes (NB), random forest (RF), multiplayer perceptron (MLP), extreme gradient boosting machine (XGB), gradient boosting decision tree (GBDT), linear support vector machine (SVM-linear), and non-linear support vector machine (SVM-nonlinear), were used to construct CAS screening models. The area under the receiver operating characteristic curve (auROC) and precision-recall curve (auPR) were used as measures of model performance. The SHapley Additive exPlanations (SHAP) method was used to demonstrate the interpretability of the optimal model. RESULTS: A total of 6315 records of patients undergoing carotid ultrasonography were collected; of these, 1632, 407, and 1141 patients were diagnosed with CAS in the training, internal validation, and external validation datasets, respectively. The GBDT model achieved the highest performance metrics with auROC of 0.860 (95% CI 0.839–0.880) in the internal validation dataset and 0.851 (95% CI 0.837–0.863) in the external validation dataset. Individuals with diabetes or those over 65 years of age showed low negative predictive value. In the interpretability analysis, age was the most important factor influencing the performance of the GBDT model, followed by sex and non-high-density lipoprotein cholesterol. CONCLUSIONS: The ML models developed could provide good performance for CAS identification using routine health check-up indicators and could hopefully be applied in scenarios without ethnic and geographic heterogeneity for CAS prevention. BioMed Central 2023-05-29 /pmc/articles/PMC10225282/ /pubmed/37246225 http://dx.doi.org/10.1186/s12967-023-04093-8 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Yun, Ke He, Tao Zhen, Shi Quan, Meihui Yang, Xiaotao Man, Dongliang Zhang, Shuang Wang, Wei Han, Xiaoxu Development and validation of explainable machine-learning models for carotid atherosclerosis early screening
title	Development and validation of explainable machine-learning models for carotid atherosclerosis early screening
title_full	Development and validation of explainable machine-learning models for carotid atherosclerosis early screening
title_fullStr	Development and validation of explainable machine-learning models for carotid atherosclerosis early screening
title_full_unstemmed	Development and validation of explainable machine-learning models for carotid atherosclerosis early screening
title_short	Development and validation of explainable machine-learning models for carotid atherosclerosis early screening
title_sort	development and validation of explainable machine-learning models for carotid atherosclerosis early screening
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10225282/ https://www.ncbi.nlm.nih.gov/pubmed/37246225 http://dx.doi.org/10.1186/s12967-023-04093-8
work_keys_str_mv	AT yunke developmentandvalidationofexplainablemachinelearningmodelsforcarotidatherosclerosisearlyscreening AT hetao developmentandvalidationofexplainablemachinelearningmodelsforcarotidatherosclerosisearlyscreening AT zhenshi developmentandvalidationofexplainablemachinelearningmodelsforcarotidatherosclerosisearlyscreening AT quanmeihui developmentandvalidationofexplainablemachinelearningmodelsforcarotidatherosclerosisearlyscreening AT yangxiaotao developmentandvalidationofexplainablemachinelearningmodelsforcarotidatherosclerosisearlyscreening AT mandongliang developmentandvalidationofexplainablemachinelearningmodelsforcarotidatherosclerosisearlyscreening AT zhangshuang developmentandvalidationofexplainablemachinelearningmodelsforcarotidatherosclerosisearlyscreening AT wangwei developmentandvalidationofexplainablemachinelearningmodelsforcarotidatherosclerosisearlyscreening AT hanxiaoxu developmentandvalidationofexplainablemachinelearningmodelsforcarotidatherosclerosisearlyscreening

Development and validation of explainable machine-learning models for carotid atherosclerosis early screening

Ejemplares similares