Cargando…
Using a machine learning approach to identify key prognostic molecules for esophageal squamous cell carcinoma
BACKGROUND: A plethora of prognostic biomarkers for esophageal squamous cell carcinoma (ESCC) that have hitherto been reported are challenged with low reproducibility due to high molecular heterogeneity of ESCC. The purpose of this study was to identify the optimal biomarkers for ESCC using machine...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8351329/ https://www.ncbi.nlm.nih.gov/pubmed/34372798 http://dx.doi.org/10.1186/s12885-021-08647-1 |
_version_ | 1783735951677194240 |
---|---|
author | Li, Meng-Xiang Sun, Xiao-Meng Cheng, Wei-Gang Ruan, Hao-Jie Liu, Ke Chen, Pan Xu, Hai-Jun Gao, She-Gan Feng, Xiao-Shan Qi, Yi-Jun |
author_facet | Li, Meng-Xiang Sun, Xiao-Meng Cheng, Wei-Gang Ruan, Hao-Jie Liu, Ke Chen, Pan Xu, Hai-Jun Gao, She-Gan Feng, Xiao-Shan Qi, Yi-Jun |
author_sort | Li, Meng-Xiang |
collection | PubMed |
description | BACKGROUND: A plethora of prognostic biomarkers for esophageal squamous cell carcinoma (ESCC) that have hitherto been reported are challenged with low reproducibility due to high molecular heterogeneity of ESCC. The purpose of this study was to identify the optimal biomarkers for ESCC using machine learning algorithms. METHODS: Biomarkers related to clinical survival, recurrence or therapeutic response of patients with ESCC were determined through literature database searching. Forty-eight biomarkers linked to recurrence or prognosis of ESCC were used to construct a molecular interaction network based on NetBox and then to identify the functional modules. Publicably available mRNA transcriptome data of ESCC downloaded from Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) datasets included GSE53625 and TCGA-ESCC. Five machine learning algorithms, including logical regression (LR), support vector machine (SVM), artificial neural network (ANN), random forest (RF) and XGBoost, were used to develop classifiers for prognostic classification for feature selection. The area under ROC curve (AUC) was used to evaluate the performance of the prognostic classifiers. The importances of identified molecules were ranked by their occurrence frequencies in the prognostic classifiers. Kaplan-Meier survival analysis and log-rank test were performed to determine the statistical significance of overall survival. RESULTS: A total of 48 clinically proven molecules associated with ESCC progression were used to construct a molecular interaction network with 3 functional modules comprising 17 component molecules. The 131,071 prognostic classifiers using these 17 molecules were built for each machine learning algorithm. Using the occurrence frequencies in the prognostic classifiers with AUCs greater than the mean value of all 131,071 AUCs to rank importances of these 17 molecules, stratifin encoded by SFN was identified as the optimal prognostic biomarker for ESCC, whose performance was further validated in another 2 independent cohorts. CONCLUSION: The occurrence frequencies across various feature selection approaches reflect the degree of clinical importance and stratifin is an optimal prognostic biomarker for ESCC. |
format | Online Article Text |
id | pubmed-8351329 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-83513292021-08-09 Using a machine learning approach to identify key prognostic molecules for esophageal squamous cell carcinoma Li, Meng-Xiang Sun, Xiao-Meng Cheng, Wei-Gang Ruan, Hao-Jie Liu, Ke Chen, Pan Xu, Hai-Jun Gao, She-Gan Feng, Xiao-Shan Qi, Yi-Jun BMC Cancer Research Article BACKGROUND: A plethora of prognostic biomarkers for esophageal squamous cell carcinoma (ESCC) that have hitherto been reported are challenged with low reproducibility due to high molecular heterogeneity of ESCC. The purpose of this study was to identify the optimal biomarkers for ESCC using machine learning algorithms. METHODS: Biomarkers related to clinical survival, recurrence or therapeutic response of patients with ESCC were determined through literature database searching. Forty-eight biomarkers linked to recurrence or prognosis of ESCC were used to construct a molecular interaction network based on NetBox and then to identify the functional modules. Publicably available mRNA transcriptome data of ESCC downloaded from Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) datasets included GSE53625 and TCGA-ESCC. Five machine learning algorithms, including logical regression (LR), support vector machine (SVM), artificial neural network (ANN), random forest (RF) and XGBoost, were used to develop classifiers for prognostic classification for feature selection. The area under ROC curve (AUC) was used to evaluate the performance of the prognostic classifiers. The importances of identified molecules were ranked by their occurrence frequencies in the prognostic classifiers. Kaplan-Meier survival analysis and log-rank test were performed to determine the statistical significance of overall survival. RESULTS: A total of 48 clinically proven molecules associated with ESCC progression were used to construct a molecular interaction network with 3 functional modules comprising 17 component molecules. The 131,071 prognostic classifiers using these 17 molecules were built for each machine learning algorithm. Using the occurrence frequencies in the prognostic classifiers with AUCs greater than the mean value of all 131,071 AUCs to rank importances of these 17 molecules, stratifin encoded by SFN was identified as the optimal prognostic biomarker for ESCC, whose performance was further validated in another 2 independent cohorts. CONCLUSION: The occurrence frequencies across various feature selection approaches reflect the degree of clinical importance and stratifin is an optimal prognostic biomarker for ESCC. BioMed Central 2021-08-09 /pmc/articles/PMC8351329/ /pubmed/34372798 http://dx.doi.org/10.1186/s12885-021-08647-1 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Article Li, Meng-Xiang Sun, Xiao-Meng Cheng, Wei-Gang Ruan, Hao-Jie Liu, Ke Chen, Pan Xu, Hai-Jun Gao, She-Gan Feng, Xiao-Shan Qi, Yi-Jun Using a machine learning approach to identify key prognostic molecules for esophageal squamous cell carcinoma |
title | Using a machine learning approach to identify key prognostic molecules for esophageal squamous cell carcinoma |
title_full | Using a machine learning approach to identify key prognostic molecules for esophageal squamous cell carcinoma |
title_fullStr | Using a machine learning approach to identify key prognostic molecules for esophageal squamous cell carcinoma |
title_full_unstemmed | Using a machine learning approach to identify key prognostic molecules for esophageal squamous cell carcinoma |
title_short | Using a machine learning approach to identify key prognostic molecules for esophageal squamous cell carcinoma |
title_sort | using a machine learning approach to identify key prognostic molecules for esophageal squamous cell carcinoma |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8351329/ https://www.ncbi.nlm.nih.gov/pubmed/34372798 http://dx.doi.org/10.1186/s12885-021-08647-1 |
work_keys_str_mv | AT limengxiang usingamachinelearningapproachtoidentifykeyprognosticmoleculesforesophagealsquamouscellcarcinoma AT sunxiaomeng usingamachinelearningapproachtoidentifykeyprognosticmoleculesforesophagealsquamouscellcarcinoma AT chengweigang usingamachinelearningapproachtoidentifykeyprognosticmoleculesforesophagealsquamouscellcarcinoma AT ruanhaojie usingamachinelearningapproachtoidentifykeyprognosticmoleculesforesophagealsquamouscellcarcinoma AT liuke usingamachinelearningapproachtoidentifykeyprognosticmoleculesforesophagealsquamouscellcarcinoma AT chenpan usingamachinelearningapproachtoidentifykeyprognosticmoleculesforesophagealsquamouscellcarcinoma AT xuhaijun usingamachinelearningapproachtoidentifykeyprognosticmoleculesforesophagealsquamouscellcarcinoma AT gaoshegan usingamachinelearningapproachtoidentifykeyprognosticmoleculesforesophagealsquamouscellcarcinoma AT fengxiaoshan usingamachinelearningapproachtoidentifykeyprognosticmoleculesforesophagealsquamouscellcarcinoma AT qiyijun usingamachinelearningapproachtoidentifykeyprognosticmoleculesforesophagealsquamouscellcarcinoma |