Cargando…

Explainable Machine Learning Model to Prediction EGFR Mutation in Lung Cancer

OBJECTIVES: The aim of this study is to determine whether the clinical features including blood markers can establish an explainable machine learning model to predict epidermal growth factor receptor (EGFR) mutation in lung cancer. METHODS: We retrospectively analyzed 7,413 patients with lung adenoc...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Ruiyuan, Xiong, Xingyu, Wang, Haoyu, Li, Weimin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9259982/
https://www.ncbi.nlm.nih.gov/pubmed/35814445
http://dx.doi.org/10.3389/fonc.2022.924144
_version_ 1784741911523753984
author Yang, Ruiyuan
Xiong, Xingyu
Wang, Haoyu
Li, Weimin
author_facet Yang, Ruiyuan
Xiong, Xingyu
Wang, Haoyu
Li, Weimin
author_sort Yang, Ruiyuan
collection PubMed
description OBJECTIVES: The aim of this study is to determine whether the clinical features including blood markers can establish an explainable machine learning model to predict epidermal growth factor receptor (EGFR) mutation in lung cancer. METHODS: We retrospectively analyzed 7,413 patients with lung adenocarcinoma (LA) diagnosed by gene sequencing in West China Hospital of the Sichuan University from April 2015 to June 2019. The machine learning algorithms (MLAs) included logistic regression (LR), random forest (RF), LightGBM, support vector machine (SVM), multi-layer perceptron (MLP), extreme gradient boosting (XGBoost), and decision tree (DT). Demographic characteristics, personal history, and blood markers were taken into. The area under the receiver operating characteristic curve (AUC) and SHapley Additive exPlanation (SHAP) value were used to explain the prediction models. RESULTS: Of the 7,413 patients with LA (47.6%), 3,527 were identified with EGFR mutation; RF achieved greatest performance in predicting EGFR mutation AUC [0.771, 95% confidence interval (CI): 0.770, 0.772], which was like XGBoost with AUC (0.740, 95% CI: 0.739, 0.741). The five most influential features were smoking consumption, sex, cholesterol, age, and albumin globulin ratio. The SHAP summary and dependence plot have been used to explain the affection of the 12 features to this model and how a single feature influences the output, respectively. CONCLUSION: We established EGFR mutation prediction models by MLAs and revealed that the RF was preferred, AUC (0.771, 95% CI: 0.770, 0.772), which was better than the traditional models. Therefore, the artificial intelligence–based MLA predicting model may become a practical tool to guide in diagnosis and therapy of LA.
format Online
Article
Text
id pubmed-9259982
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-92599822022-07-08 Explainable Machine Learning Model to Prediction EGFR Mutation in Lung Cancer Yang, Ruiyuan Xiong, Xingyu Wang, Haoyu Li, Weimin Front Oncol Oncology OBJECTIVES: The aim of this study is to determine whether the clinical features including blood markers can establish an explainable machine learning model to predict epidermal growth factor receptor (EGFR) mutation in lung cancer. METHODS: We retrospectively analyzed 7,413 patients with lung adenocarcinoma (LA) diagnosed by gene sequencing in West China Hospital of the Sichuan University from April 2015 to June 2019. The machine learning algorithms (MLAs) included logistic regression (LR), random forest (RF), LightGBM, support vector machine (SVM), multi-layer perceptron (MLP), extreme gradient boosting (XGBoost), and decision tree (DT). Demographic characteristics, personal history, and blood markers were taken into. The area under the receiver operating characteristic curve (AUC) and SHapley Additive exPlanation (SHAP) value were used to explain the prediction models. RESULTS: Of the 7,413 patients with LA (47.6%), 3,527 were identified with EGFR mutation; RF achieved greatest performance in predicting EGFR mutation AUC [0.771, 95% confidence interval (CI): 0.770, 0.772], which was like XGBoost with AUC (0.740, 95% CI: 0.739, 0.741). The five most influential features were smoking consumption, sex, cholesterol, age, and albumin globulin ratio. The SHAP summary and dependence plot have been used to explain the affection of the 12 features to this model and how a single feature influences the output, respectively. CONCLUSION: We established EGFR mutation prediction models by MLAs and revealed that the RF was preferred, AUC (0.771, 95% CI: 0.770, 0.772), which was better than the traditional models. Therefore, the artificial intelligence–based MLA predicting model may become a practical tool to guide in diagnosis and therapy of LA. Frontiers Media S.A. 2022-06-23 /pmc/articles/PMC9259982/ /pubmed/35814445 http://dx.doi.org/10.3389/fonc.2022.924144 Text en Copyright © 2022 Yang, Xiong, Wang and Li https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Oncology
Yang, Ruiyuan
Xiong, Xingyu
Wang, Haoyu
Li, Weimin
Explainable Machine Learning Model to Prediction EGFR Mutation in Lung Cancer
title Explainable Machine Learning Model to Prediction EGFR Mutation in Lung Cancer
title_full Explainable Machine Learning Model to Prediction EGFR Mutation in Lung Cancer
title_fullStr Explainable Machine Learning Model to Prediction EGFR Mutation in Lung Cancer
title_full_unstemmed Explainable Machine Learning Model to Prediction EGFR Mutation in Lung Cancer
title_short Explainable Machine Learning Model to Prediction EGFR Mutation in Lung Cancer
title_sort explainable machine learning model to prediction egfr mutation in lung cancer
topic Oncology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9259982/
https://www.ncbi.nlm.nih.gov/pubmed/35814445
http://dx.doi.org/10.3389/fonc.2022.924144
work_keys_str_mv AT yangruiyuan explainablemachinelearningmodeltopredictionegfrmutationinlungcancer
AT xiongxingyu explainablemachinelearningmodeltopredictionegfrmutationinlungcancer
AT wanghaoyu explainablemachinelearningmodeltopredictionegfrmutationinlungcancer
AT liweimin explainablemachinelearningmodeltopredictionegfrmutationinlungcancer