Cargando…
A Machine Learning-Based Predictive Model of Epidermal Growth Factor Mutations in Lung Adenocarcinomas
SIMPLE SUMMARY: Targeted therapy against epidermal growth factor (EGFR) mutations has become the standard of care for non-small cell lung cancer, and there has not been an efficient genetic test for non-small cell lung cancer patients. The present study aims to find a novel data-driven genetic testi...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9563411/ https://www.ncbi.nlm.nih.gov/pubmed/36230590 http://dx.doi.org/10.3390/cancers14194664 |
_version_ | 1784808397823016960 |
---|---|
author | He, Ruimin Yang, Xiaohua Li, Tengxiang He, Yaolin Xie, Xiaoxue Chen, Qilei Zhang, Zijian Cheng, Tingting |
author_facet | He, Ruimin Yang, Xiaohua Li, Tengxiang He, Yaolin Xie, Xiaoxue Chen, Qilei Zhang, Zijian Cheng, Tingting |
author_sort | He, Ruimin |
collection | PubMed |
description | SIMPLE SUMMARY: Targeted therapy against epidermal growth factor (EGFR) mutations has become the standard of care for non-small cell lung cancer, and there has not been an efficient genetic test for non-small cell lung cancer patients. The present study aims to find a novel data-driven genetic testing method that can effectively predict the mutation status of EGFR based on a prediction model combining clinical features. The results of this study provide a powerful theoretical basis for the establishment of an effective mutation prediction model. The prediction model can provide a high reference value aiding in EGFR mutation diagnosis and subsequent treatment course. ABSTRACT: Data from 758 patients with lung adenocarcinoma were retrospectively collected. All patients had undergone computed tomography imaging and EGFR gene testing. Radiomic features were extracted using the medical imaging tool 3D-Slicer and were combined with the clinical features to build a machine learning prediction model. The high-dimensional feature set was screened for optimal feature subsets using principal component analysis (PCA) and the least absolute shrinkage and selection operator (LASSO). Model prediction of EGFR mutation status in the validation group was evaluated using multiple classifiers. We showed that six clinical features and 622 radiomic features were initially collected. Thirty-one radiomic features with non-zero correlation coefficients were obtained by LASSO regression, and 24 features correlated with label values were obtained by PCA. The shared radiomic features determined by these two methods were selected and combined with the clinical features of the respective patient to form a subset of features related to EGFR mutations. The full dataset was partitioned into training and test sets at a ratio of 7:3 using 10-fold cross-validation. The area under the curve (AUC) of the four classifiers with cross-validations was: (1) K-nearest neighbor (AUCmean = 0.83, Acc = 81%); (2) random forest (AUCmean = 0.91, Acc = 83%); (3) LGBM (AUCmean = 0.94, Acc = 88%); and (4) support vector machine (AUCmean = 0.79, Acc = 83%). In summary, the subset of radiographic and clinical features selected by feature engineering effectively predicted the EGFR mutation status of this NSCLC patient cohort. |
format | Online Article Text |
id | pubmed-9563411 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-95634112022-10-15 A Machine Learning-Based Predictive Model of Epidermal Growth Factor Mutations in Lung Adenocarcinomas He, Ruimin Yang, Xiaohua Li, Tengxiang He, Yaolin Xie, Xiaoxue Chen, Qilei Zhang, Zijian Cheng, Tingting Cancers (Basel) Article SIMPLE SUMMARY: Targeted therapy against epidermal growth factor (EGFR) mutations has become the standard of care for non-small cell lung cancer, and there has not been an efficient genetic test for non-small cell lung cancer patients. The present study aims to find a novel data-driven genetic testing method that can effectively predict the mutation status of EGFR based on a prediction model combining clinical features. The results of this study provide a powerful theoretical basis for the establishment of an effective mutation prediction model. The prediction model can provide a high reference value aiding in EGFR mutation diagnosis and subsequent treatment course. ABSTRACT: Data from 758 patients with lung adenocarcinoma were retrospectively collected. All patients had undergone computed tomography imaging and EGFR gene testing. Radiomic features were extracted using the medical imaging tool 3D-Slicer and were combined with the clinical features to build a machine learning prediction model. The high-dimensional feature set was screened for optimal feature subsets using principal component analysis (PCA) and the least absolute shrinkage and selection operator (LASSO). Model prediction of EGFR mutation status in the validation group was evaluated using multiple classifiers. We showed that six clinical features and 622 radiomic features were initially collected. Thirty-one radiomic features with non-zero correlation coefficients were obtained by LASSO regression, and 24 features correlated with label values were obtained by PCA. The shared radiomic features determined by these two methods were selected and combined with the clinical features of the respective patient to form a subset of features related to EGFR mutations. The full dataset was partitioned into training and test sets at a ratio of 7:3 using 10-fold cross-validation. The area under the curve (AUC) of the four classifiers with cross-validations was: (1) K-nearest neighbor (AUCmean = 0.83, Acc = 81%); (2) random forest (AUCmean = 0.91, Acc = 83%); (3) LGBM (AUCmean = 0.94, Acc = 88%); and (4) support vector machine (AUCmean = 0.79, Acc = 83%). In summary, the subset of radiographic and clinical features selected by feature engineering effectively predicted the EGFR mutation status of this NSCLC patient cohort. MDPI 2022-09-25 /pmc/articles/PMC9563411/ /pubmed/36230590 http://dx.doi.org/10.3390/cancers14194664 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article He, Ruimin Yang, Xiaohua Li, Tengxiang He, Yaolin Xie, Xiaoxue Chen, Qilei Zhang, Zijian Cheng, Tingting A Machine Learning-Based Predictive Model of Epidermal Growth Factor Mutations in Lung Adenocarcinomas |
title | A Machine Learning-Based Predictive Model of Epidermal Growth Factor Mutations in Lung Adenocarcinomas |
title_full | A Machine Learning-Based Predictive Model of Epidermal Growth Factor Mutations in Lung Adenocarcinomas |
title_fullStr | A Machine Learning-Based Predictive Model of Epidermal Growth Factor Mutations in Lung Adenocarcinomas |
title_full_unstemmed | A Machine Learning-Based Predictive Model of Epidermal Growth Factor Mutations in Lung Adenocarcinomas |
title_short | A Machine Learning-Based Predictive Model of Epidermal Growth Factor Mutations in Lung Adenocarcinomas |
title_sort | machine learning-based predictive model of epidermal growth factor mutations in lung adenocarcinomas |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9563411/ https://www.ncbi.nlm.nih.gov/pubmed/36230590 http://dx.doi.org/10.3390/cancers14194664 |
work_keys_str_mv | AT heruimin amachinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas AT yangxiaohua amachinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas AT litengxiang amachinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas AT heyaolin amachinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas AT xiexiaoxue amachinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas AT chenqilei amachinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas AT zhangzijian amachinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas AT chengtingting amachinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas AT heruimin machinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas AT yangxiaohua machinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas AT litengxiang machinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas AT heyaolin machinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas AT xiexiaoxue machinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas AT chenqilei machinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas AT zhangzijian machinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas AT chengtingting machinelearningbasedpredictivemodelofepidermalgrowthfactormutationsinlungadenocarcinomas |