Cargando…

A Systems Biology and LASSO-Based Approach to Decipher the Transcriptome–Interactome Signature for Predicting Non-Small Cell Lung Cancer

SIMPLE SUMMARY: Non-small cell lung cancer (NSCLC) is a serious public health issue due to its high mortality rate. To improve the survival rate of NSCLC with better treatment, it is imperative to develop a biomarker-based prediction tool that can accurately identify NSCLC at a very early stage. Can...

Descripción completa

Detalles Bibliográficos
Autores principales: Ahmed, Firoz, Khan, Abdul Arif, Ansari, Hifzur Rahman, Haque, Absarul
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9774707/
https://www.ncbi.nlm.nih.gov/pubmed/36552262
http://dx.doi.org/10.3390/biology11121752
_version_ 1784855473929846784
author Ahmed, Firoz
Khan, Abdul Arif
Ansari, Hifzur Rahman
Haque, Absarul
author_facet Ahmed, Firoz
Khan, Abdul Arif
Ansari, Hifzur Rahman
Haque, Absarul
author_sort Ahmed, Firoz
collection PubMed
description SIMPLE SUMMARY: Non-small cell lung cancer (NSCLC) is a serious public health issue due to its high mortality rate. To improve the survival rate of NSCLC with better treatment, it is imperative to develop a biomarker-based prediction tool that can accurately identify NSCLC at a very early stage. Cancer development initiates due to aberrations in gene expression and the regulatory networks; therefore, these features hold a great potential to diagnose cancer at an early stage compared with the visible morphological and pathological changes. In this study, we integrated gene expression and interactome data to identify candidate genes altered in NSCLC compared with normal samples. We then used a machine learning technique to identify a signature of 17 genes and developed a model for predicting NSCLC. Interestingly, our model predicted NSCLC across different independent test datasets with high accuracy. Finally, the model was implemented to create a user-friendly web tool, NSCLCpred, to predict NSCLC using the expression profile of 17 genes. We expect that our findings will guide the identification of NSCLC patients and provide more insight into the understanding of disease development. ABSTRACT: The lack of precise molecular signatures limits the early diagnosis of non-small cell lung cancer (NSCLC). The present study used gene expression data and interaction networks to develop a highly accurate model with the least absolute shrinkage and selection operator (LASSO) for predicting NSCLC. The differentially expressed genes (DEGs) were identified in NSCLC compared with normal tissues using TCGA and GTEx data. A biological network was constructed using DEGs, and the top 20 upregulated and 20 downregulated hub genes were identified. These hub genes were used to identify signature genes with penalized logistic regression using the LASSO to predict NSCLC. Our model’s development involved the following steps: (i) the dataset was divided into 80% for training (TR) and 20% for testing (TD1); (ii) a LASSO logistic regression analysis was performed on the TR with 10-fold cross-validation and identified a combination of 17 genes as NSCLC predictors, which were used further for development of the LASSO model. The model’s performance was assessed on the TD1 dataset and achieved an accuracy and an area under the curve of the receiver operating characteristics (AUC-ROC) of 0.986 and 0.998, respectively. Furthermore, the performance of the LASSO model was evaluated using three independent NSCLC test datasets (GSE18842, GSE27262, GSE19804) and achieved high accuracy, with an AUC-ROC of >0.99, >0.99, and 0.95, respectively. Based on this study, a web application called NSCLCpred was developed to predict NSCLC.
format Online
Article
Text
id pubmed-9774707
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-97747072022-12-23 A Systems Biology and LASSO-Based Approach to Decipher the Transcriptome–Interactome Signature for Predicting Non-Small Cell Lung Cancer Ahmed, Firoz Khan, Abdul Arif Ansari, Hifzur Rahman Haque, Absarul Biology (Basel) Article SIMPLE SUMMARY: Non-small cell lung cancer (NSCLC) is a serious public health issue due to its high mortality rate. To improve the survival rate of NSCLC with better treatment, it is imperative to develop a biomarker-based prediction tool that can accurately identify NSCLC at a very early stage. Cancer development initiates due to aberrations in gene expression and the regulatory networks; therefore, these features hold a great potential to diagnose cancer at an early stage compared with the visible morphological and pathological changes. In this study, we integrated gene expression and interactome data to identify candidate genes altered in NSCLC compared with normal samples. We then used a machine learning technique to identify a signature of 17 genes and developed a model for predicting NSCLC. Interestingly, our model predicted NSCLC across different independent test datasets with high accuracy. Finally, the model was implemented to create a user-friendly web tool, NSCLCpred, to predict NSCLC using the expression profile of 17 genes. We expect that our findings will guide the identification of NSCLC patients and provide more insight into the understanding of disease development. ABSTRACT: The lack of precise molecular signatures limits the early diagnosis of non-small cell lung cancer (NSCLC). The present study used gene expression data and interaction networks to develop a highly accurate model with the least absolute shrinkage and selection operator (LASSO) for predicting NSCLC. The differentially expressed genes (DEGs) were identified in NSCLC compared with normal tissues using TCGA and GTEx data. A biological network was constructed using DEGs, and the top 20 upregulated and 20 downregulated hub genes were identified. These hub genes were used to identify signature genes with penalized logistic regression using the LASSO to predict NSCLC. Our model’s development involved the following steps: (i) the dataset was divided into 80% for training (TR) and 20% for testing (TD1); (ii) a LASSO logistic regression analysis was performed on the TR with 10-fold cross-validation and identified a combination of 17 genes as NSCLC predictors, which were used further for development of the LASSO model. The model’s performance was assessed on the TD1 dataset and achieved an accuracy and an area under the curve of the receiver operating characteristics (AUC-ROC) of 0.986 and 0.998, respectively. Furthermore, the performance of the LASSO model was evaluated using three independent NSCLC test datasets (GSE18842, GSE27262, GSE19804) and achieved high accuracy, with an AUC-ROC of >0.99, >0.99, and 0.95, respectively. Based on this study, a web application called NSCLCpred was developed to predict NSCLC. MDPI 2022-11-30 /pmc/articles/PMC9774707/ /pubmed/36552262 http://dx.doi.org/10.3390/biology11121752 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Ahmed, Firoz
Khan, Abdul Arif
Ansari, Hifzur Rahman
Haque, Absarul
A Systems Biology and LASSO-Based Approach to Decipher the Transcriptome–Interactome Signature for Predicting Non-Small Cell Lung Cancer
title A Systems Biology and LASSO-Based Approach to Decipher the Transcriptome–Interactome Signature for Predicting Non-Small Cell Lung Cancer
title_full A Systems Biology and LASSO-Based Approach to Decipher the Transcriptome–Interactome Signature for Predicting Non-Small Cell Lung Cancer
title_fullStr A Systems Biology and LASSO-Based Approach to Decipher the Transcriptome–Interactome Signature for Predicting Non-Small Cell Lung Cancer
title_full_unstemmed A Systems Biology and LASSO-Based Approach to Decipher the Transcriptome–Interactome Signature for Predicting Non-Small Cell Lung Cancer
title_short A Systems Biology and LASSO-Based Approach to Decipher the Transcriptome–Interactome Signature for Predicting Non-Small Cell Lung Cancer
title_sort systems biology and lasso-based approach to decipher the transcriptome–interactome signature for predicting non-small cell lung cancer
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9774707/
https://www.ncbi.nlm.nih.gov/pubmed/36552262
http://dx.doi.org/10.3390/biology11121752
work_keys_str_mv AT ahmedfiroz asystemsbiologyandlassobasedapproachtodecipherthetranscriptomeinteractomesignatureforpredictingnonsmallcelllungcancer
AT khanabdularif asystemsbiologyandlassobasedapproachtodecipherthetranscriptomeinteractomesignatureforpredictingnonsmallcelllungcancer
AT ansarihifzurrahman asystemsbiologyandlassobasedapproachtodecipherthetranscriptomeinteractomesignatureforpredictingnonsmallcelllungcancer
AT haqueabsarul asystemsbiologyandlassobasedapproachtodecipherthetranscriptomeinteractomesignatureforpredictingnonsmallcelllungcancer
AT ahmedfiroz systemsbiologyandlassobasedapproachtodecipherthetranscriptomeinteractomesignatureforpredictingnonsmallcelllungcancer
AT khanabdularif systemsbiologyandlassobasedapproachtodecipherthetranscriptomeinteractomesignatureforpredictingnonsmallcelllungcancer
AT ansarihifzurrahman systemsbiologyandlassobasedapproachtodecipherthetranscriptomeinteractomesignatureforpredictingnonsmallcelllungcancer
AT haqueabsarul systemsbiologyandlassobasedapproachtodecipherthetranscriptomeinteractomesignatureforpredictingnonsmallcelllungcancer