Cargando…

A Machine Learning-Based Predictive Model of Epidermal Growth Factor Mutations in Lung Adenocarcinomas

SIMPLE SUMMARY: Targeted therapy against epidermal growth factor (EGFR) mutations has become the standard of care for non-small cell lung cancer, and there has not been an efficient genetic test for non-small cell lung cancer patients. The present study aims to find a novel data-driven genetic testi...

Descripción completa

Detalles Bibliográficos
Autores principales: He, Ruimin, Yang, Xiaohua, Li, Tengxiang, He, Yaolin, Xie, Xiaoxue, Chen, Qilei, Zhang, Zijian, Cheng, Tingting
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9563411/
https://www.ncbi.nlm.nih.gov/pubmed/36230590
http://dx.doi.org/10.3390/cancers14194664
_version_ 1784808397823016960
author He, Ruimin
Yang, Xiaohua
Li, Tengxiang
He, Yaolin
Xie, Xiaoxue
Chen, Qilei
Zhang, Zijian
Cheng, Tingting
author_facet He, Ruimin
Yang, Xiaohua
Li, Tengxiang
He, Yaolin
Xie, Xiaoxue
Chen, Qilei
Zhang, Zijian
Cheng, Tingting
author_sort He, Ruimin
collection PubMed
description SIMPLE SUMMARY: Targeted therapy against epidermal growth factor (EGFR) mutations has become the standard of care for non-small cell lung cancer, and there has not been an efficient genetic test for non-small cell lung cancer patients. The present study aims to find a novel data-driven genetic testing method that can effectively predict the mutation status of EGFR based on a prediction model combining clinical features. The results of this study provide a powerful theoretical basis for the establishment of an effective mutation prediction model. The prediction model can provide a high reference value aiding in EGFR mutation diagnosis and subsequent treatment course. ABSTRACT: Data from 758 patients with lung adenocarcinoma were retrospectively collected. All patients had undergone computed tomography imaging and EGFR gene testing. Radiomic features were extracted using the medical imaging tool 3D-Slicer and were combined with the clinical features to build a machine learning prediction model. The high-dimensional feature set was screened for optimal feature subsets using principal component analysis (PCA) and the least absolute shrinkage and selection operator (LASSO). Model prediction of EGFR mutation status in the validation group was evaluated using multiple classifiers. We showed that six clinical features and 622 radiomic features were initially collected. Thirty-one radiomic features with non-zero correlation coefficients were obtained by LASSO regression, and 24 features correlated with label values were obtained by PCA. The shared radiomic features determined by these two methods were selected and combined with the clinical features of the respective patient to form a subset of features related to EGFR mutations. The full dataset was partitioned into training and test sets at a ratio of 7:3 using 10-fold cross-validation. The area under the curve (AUC) of the four classifiers with cross-validations was: (1) K-nearest neighbor (AUCmean = 0.83, Acc = 81%); (2) random forest (AUCmean = 0.91, Acc = 83%); (3) LGBM (AUCmean = 0.94, Acc = 88%); and (4) support vector machine (AUCmean = 0.79, Acc = 83%). In summary, the subset of radiographic and clinical features selected by feature engineering effectively predicted the EGFR mutation status of this NSCLC patient cohort.
format Online
Article
Text
id pubmed-9563411
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-95634112022-10-15 A Machine Learning-Based Predictive Model of Epidermal Growth Factor Mutations in Lung Adenocarcinomas He, Ruimin Yang, Xiaohua Li, Tengxiang He, Yaolin Xie, Xiaoxue Chen, Qilei Zhang, Zijian Cheng, Tingting Cancers (Basel) Article SIMPLE SUMMARY: Targeted therapy against epidermal growth factor (EGFR) mutations has become the standard of care for non-small cell lung cancer, and there has not been an efficient genetic test for non-small cell lung cancer patients. The present study aims to find a novel data-driven genetic testing method that can effectively predict the mutation status of EGFR based on a prediction model combining clinical features. The results of this study provide a powerful theoretical basis for the establishment of an effective mutation prediction model. The prediction model can provide a high reference value aiding in EGFR mutation diagnosis and subsequent treatment course. ABSTRACT: Data from 758 patients with lung adenocarcinoma were retrospectively collected. All patients had undergone computed tomography imaging and EGFR gene testing. Radiomic features were extracted using the medical imaging tool 3D-Slicer and were combined with the clinical features to build a machine learning prediction model. The high-dimensional feature set was screened for optimal feature subsets using principal component analysis (PCA) and the least absolute shrinkage and selection operator (LASSO). Model prediction of EGFR mutation status in the validation group was evaluated using multiple classifiers. We showed that six clinical features and 622 radiomic features were initially collected. Thirty-one radiomic features with non-zero correlation coefficients were obtained by LASSO regression, and 24 features correlated with label values were obtained by PCA. The shared radiomic features determined by these two methods were selected and combined with the clinical features of the respective patient to form a subset of features related to EGFR mutations. The full dataset was partitioned into training and test sets at a ratio of 7:3 using 10-fold cross-validation. The area under the curve (AUC) of the four classifiers with cross-validations was: (1) K-nearest neighbor (AUCmean = 0.83, Acc = 81%); (2) random forest (AUCmean = 0.91, Acc = 83%); (3) LGBM (AUCmean = 0.94, Acc = 88%); and (4) support vector machine (AUCmean = 0.79, Acc = 83%). In summary, the subset of radiographic and clinical features selected by feature engineering effectively predicted the EGFR mutation status of this NSCLC patient cohort. MDPI 2022-09-25 /pmc/articles/PMC9563411/ /pubmed/36230590 http://dx.doi.org/10.3390/cancers14194664 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
He, Ruimin
Yang, Xiaohua
Li, Tengxiang
He, Yaolin
Xie, Xiaoxue
Chen, Qilei
Zhang, Zijian
Cheng, Tingting
A Machine Learning-Based Predictive Model of Epidermal Growth Factor Mutations in Lung Adenocarcinomas
title A Machine Learning-Based Predictive Model of Epidermal Growth Factor Mutations in Lung Adenocarcinomas
title_full A Machine Learning-Based Predictive Model of Epidermal Growth Factor Mutations in Lung Adenocarcinomas
title_fullStr A Machine Learning-Based Predictive Model of Epidermal Growth Factor Mutations in Lung Adenocarcinomas
title_full_unstemmed A Machine Learning-Based Predictive Model of Epidermal Growth Factor Mutations in Lung Adenocarcinomas
title_short A Machine Learning-Based Predictive Model of Epidermal Growth Factor Mutations in Lung Adenocarcinomas
title_sort machine learning-based predictive model of epidermal growth factor mutations in lung adenocarcinomas
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9563411/
https://www.ncbi.nlm.nih.gov/pubmed/36230590
http://dx.doi.org/10.3390/cancers14194664
work_keys_str_mv AT heruimin amachinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas
AT yangxiaohua amachinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas
AT litengxiang amachinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas
AT heyaolin amachinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas
AT xiexiaoxue amachinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas
AT chenqilei amachinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas
AT zhangzijian amachinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas
AT chengtingting amachinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas
AT heruimin machinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas
AT yangxiaohua machinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas
AT litengxiang machinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas
AT heyaolin machinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas
AT xiexiaoxue machinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas
AT chenqilei machinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas
AT zhangzijian machinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas
AT chengtingting machinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas