Cargando…

A Machine Learning-Based Predictive Model of Epidermal Growth Factor Mutations in Lung Adenocarcinomas

SIMPLE SUMMARY: Targeted therapy against epidermal growth factor (EGFR) mutations has become the standard of care for non-small cell lung cancer, and there has not been an efficient genetic test for non-small cell lung cancer patients. The present study aims to find a novel data-driven genetic testi...

Descripción completa

Detalles Bibliográficos
Autores principales: He, Ruimin, Yang, Xiaohua, Li, Tengxiang, He, Yaolin, Xie, Xiaoxue, Chen, Qilei, Zhang, Zijian, Cheng, Tingting
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9563411/
https://www.ncbi.nlm.nih.gov/pubmed/36230590
http://dx.doi.org/10.3390/cancers14194664
Descripción
Sumario:SIMPLE SUMMARY: Targeted therapy against epidermal growth factor (EGFR) mutations has become the standard of care for non-small cell lung cancer, and there has not been an efficient genetic test for non-small cell lung cancer patients. The present study aims to find a novel data-driven genetic testing method that can effectively predict the mutation status of EGFR based on a prediction model combining clinical features. The results of this study provide a powerful theoretical basis for the establishment of an effective mutation prediction model. The prediction model can provide a high reference value aiding in EGFR mutation diagnosis and subsequent treatment course. ABSTRACT: Data from 758 patients with lung adenocarcinoma were retrospectively collected. All patients had undergone computed tomography imaging and EGFR gene testing. Radiomic features were extracted using the medical imaging tool 3D-Slicer and were combined with the clinical features to build a machine learning prediction model. The high-dimensional feature set was screened for optimal feature subsets using principal component analysis (PCA) and the least absolute shrinkage and selection operator (LASSO). Model prediction of EGFR mutation status in the validation group was evaluated using multiple classifiers. We showed that six clinical features and 622 radiomic features were initially collected. Thirty-one radiomic features with non-zero correlation coefficients were obtained by LASSO regression, and 24 features correlated with label values were obtained by PCA. The shared radiomic features determined by these two methods were selected and combined with the clinical features of the respective patient to form a subset of features related to EGFR mutations. The full dataset was partitioned into training and test sets at a ratio of 7:3 using 10-fold cross-validation. The area under the curve (AUC) of the four classifiers with cross-validations was: (1) K-nearest neighbor (AUCmean = 0.83, Acc = 81%); (2) random forest (AUCmean = 0.91, Acc = 83%); (3) LGBM (AUCmean = 0.94, Acc = 88%); and (4) support vector machine (AUCmean = 0.79, Acc = 83%). In summary, the subset of radiographic and clinical features selected by feature engineering effectively predicted the EGFR mutation status of this NSCLC patient cohort.