Cargando…
A Systems Biology and LASSO-Based Approach to Decipher the Transcriptome–Interactome Signature for Predicting Non-Small Cell Lung Cancer
SIMPLE SUMMARY: Non-small cell lung cancer (NSCLC) is a serious public health issue due to its high mortality rate. To improve the survival rate of NSCLC with better treatment, it is imperative to develop a biomarker-based prediction tool that can accurately identify NSCLC at a very early stage. Can...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9774707/ https://www.ncbi.nlm.nih.gov/pubmed/36552262 http://dx.doi.org/10.3390/biology11121752 |
Sumario: | SIMPLE SUMMARY: Non-small cell lung cancer (NSCLC) is a serious public health issue due to its high mortality rate. To improve the survival rate of NSCLC with better treatment, it is imperative to develop a biomarker-based prediction tool that can accurately identify NSCLC at a very early stage. Cancer development initiates due to aberrations in gene expression and the regulatory networks; therefore, these features hold a great potential to diagnose cancer at an early stage compared with the visible morphological and pathological changes. In this study, we integrated gene expression and interactome data to identify candidate genes altered in NSCLC compared with normal samples. We then used a machine learning technique to identify a signature of 17 genes and developed a model for predicting NSCLC. Interestingly, our model predicted NSCLC across different independent test datasets with high accuracy. Finally, the model was implemented to create a user-friendly web tool, NSCLCpred, to predict NSCLC using the expression profile of 17 genes. We expect that our findings will guide the identification of NSCLC patients and provide more insight into the understanding of disease development. ABSTRACT: The lack of precise molecular signatures limits the early diagnosis of non-small cell lung cancer (NSCLC). The present study used gene expression data and interaction networks to develop a highly accurate model with the least absolute shrinkage and selection operator (LASSO) for predicting NSCLC. The differentially expressed genes (DEGs) were identified in NSCLC compared with normal tissues using TCGA and GTEx data. A biological network was constructed using DEGs, and the top 20 upregulated and 20 downregulated hub genes were identified. These hub genes were used to identify signature genes with penalized logistic regression using the LASSO to predict NSCLC. Our model’s development involved the following steps: (i) the dataset was divided into 80% for training (TR) and 20% for testing (TD1); (ii) a LASSO logistic regression analysis was performed on the TR with 10-fold cross-validation and identified a combination of 17 genes as NSCLC predictors, which were used further for development of the LASSO model. The model’s performance was assessed on the TD1 dataset and achieved an accuracy and an area under the curve of the receiver operating characteristics (AUC-ROC) of 0.986 and 0.998, respectively. Furthermore, the performance of the LASSO model was evaluated using three independent NSCLC test datasets (GSE18842, GSE27262, GSE19804) and achieved high accuracy, with an AUC-ROC of >0.99, >0.99, and 0.95, respectively. Based on this study, a web application called NSCLCpred was developed to predict NSCLC. |
---|