Cargando…

Using a machine learning approach to identify key prognostic molecules for esophageal squamous cell carcinoma

BACKGROUND: A plethora of prognostic biomarkers for esophageal squamous cell carcinoma (ESCC) that have hitherto been reported are challenged with low reproducibility due to high molecular heterogeneity of ESCC. The purpose of this study was to identify the optimal biomarkers for ESCC using machine...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Meng-Xiang, Sun, Xiao-Meng, Cheng, Wei-Gang, Ruan, Hao-Jie, Liu, Ke, Chen, Pan, Xu, Hai-Jun, Gao, She-Gan, Feng, Xiao-Shan, Qi, Yi-Jun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8351329/
https://www.ncbi.nlm.nih.gov/pubmed/34372798
http://dx.doi.org/10.1186/s12885-021-08647-1
_version_ 1783735951677194240
author Li, Meng-Xiang
Sun, Xiao-Meng
Cheng, Wei-Gang
Ruan, Hao-Jie
Liu, Ke
Chen, Pan
Xu, Hai-Jun
Gao, She-Gan
Feng, Xiao-Shan
Qi, Yi-Jun
author_facet Li, Meng-Xiang
Sun, Xiao-Meng
Cheng, Wei-Gang
Ruan, Hao-Jie
Liu, Ke
Chen, Pan
Xu, Hai-Jun
Gao, She-Gan
Feng, Xiao-Shan
Qi, Yi-Jun
author_sort Li, Meng-Xiang
collection PubMed
description BACKGROUND: A plethora of prognostic biomarkers for esophageal squamous cell carcinoma (ESCC) that have hitherto been reported are challenged with low reproducibility due to high molecular heterogeneity of ESCC. The purpose of this study was to identify the optimal biomarkers for ESCC using machine learning algorithms. METHODS: Biomarkers related to clinical survival, recurrence or therapeutic response of patients with ESCC were determined through literature database searching. Forty-eight biomarkers linked to recurrence or prognosis of ESCC were used to construct a molecular interaction network based on NetBox and then to identify the functional modules. Publicably available mRNA transcriptome data of ESCC downloaded from Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) datasets included GSE53625 and TCGA-ESCC. Five machine learning algorithms, including logical regression (LR), support vector machine (SVM), artificial neural network (ANN), random forest (RF) and XGBoost, were used to develop classifiers for prognostic classification for feature selection. The area under ROC curve (AUC) was used to evaluate the performance of the prognostic classifiers. The importances of identified molecules were ranked by their occurrence frequencies in the prognostic classifiers. Kaplan-Meier survival analysis and log-rank test were performed to determine the statistical significance of overall survival. RESULTS: A total of 48 clinically proven molecules associated with ESCC progression were used to construct a molecular interaction network with 3 functional modules comprising 17 component molecules. The 131,071 prognostic classifiers using these 17 molecules were built for each machine learning algorithm. Using the occurrence frequencies in the prognostic classifiers with AUCs greater than the mean value of all 131,071 AUCs to rank importances of these 17 molecules, stratifin encoded by SFN was identified as the optimal prognostic biomarker for ESCC, whose performance was further validated in another 2 independent cohorts. CONCLUSION: The occurrence frequencies across various feature selection approaches reflect the degree of clinical importance and stratifin is an optimal prognostic biomarker for ESCC.
format Online
Article
Text
id pubmed-8351329
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-83513292021-08-09 Using a machine learning approach to identify key prognostic molecules for esophageal squamous cell carcinoma Li, Meng-Xiang Sun, Xiao-Meng Cheng, Wei-Gang Ruan, Hao-Jie Liu, Ke Chen, Pan Xu, Hai-Jun Gao, She-Gan Feng, Xiao-Shan Qi, Yi-Jun BMC Cancer Research Article BACKGROUND: A plethora of prognostic biomarkers for esophageal squamous cell carcinoma (ESCC) that have hitherto been reported are challenged with low reproducibility due to high molecular heterogeneity of ESCC. The purpose of this study was to identify the optimal biomarkers for ESCC using machine learning algorithms. METHODS: Biomarkers related to clinical survival, recurrence or therapeutic response of patients with ESCC were determined through literature database searching. Forty-eight biomarkers linked to recurrence or prognosis of ESCC were used to construct a molecular interaction network based on NetBox and then to identify the functional modules. Publicably available mRNA transcriptome data of ESCC downloaded from Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) datasets included GSE53625 and TCGA-ESCC. Five machine learning algorithms, including logical regression (LR), support vector machine (SVM), artificial neural network (ANN), random forest (RF) and XGBoost, were used to develop classifiers for prognostic classification for feature selection. The area under ROC curve (AUC) was used to evaluate the performance of the prognostic classifiers. The importances of identified molecules were ranked by their occurrence frequencies in the prognostic classifiers. Kaplan-Meier survival analysis and log-rank test were performed to determine the statistical significance of overall survival. RESULTS: A total of 48 clinically proven molecules associated with ESCC progression were used to construct a molecular interaction network with 3 functional modules comprising 17 component molecules. The 131,071 prognostic classifiers using these 17 molecules were built for each machine learning algorithm. Using the occurrence frequencies in the prognostic classifiers with AUCs greater than the mean value of all 131,071 AUCs to rank importances of these 17 molecules, stratifin encoded by SFN was identified as the optimal prognostic biomarker for ESCC, whose performance was further validated in another 2 independent cohorts. CONCLUSION: The occurrence frequencies across various feature selection approaches reflect the degree of clinical importance and stratifin is an optimal prognostic biomarker for ESCC. BioMed Central 2021-08-09 /pmc/articles/PMC8351329/ /pubmed/34372798 http://dx.doi.org/10.1186/s12885-021-08647-1 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Li, Meng-Xiang
Sun, Xiao-Meng
Cheng, Wei-Gang
Ruan, Hao-Jie
Liu, Ke
Chen, Pan
Xu, Hai-Jun
Gao, She-Gan
Feng, Xiao-Shan
Qi, Yi-Jun
Using a machine learning approach to identify key prognostic molecules for esophageal squamous cell carcinoma
title Using a machine learning approach to identify key prognostic molecules for esophageal squamous cell carcinoma
title_full Using a machine learning approach to identify key prognostic molecules for esophageal squamous cell carcinoma
title_fullStr Using a machine learning approach to identify key prognostic molecules for esophageal squamous cell carcinoma
title_full_unstemmed Using a machine learning approach to identify key prognostic molecules for esophageal squamous cell carcinoma
title_short Using a machine learning approach to identify key prognostic molecules for esophageal squamous cell carcinoma
title_sort using a machine learning approach to identify key prognostic molecules for esophageal squamous cell carcinoma
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8351329/
https://www.ncbi.nlm.nih.gov/pubmed/34372798
http://dx.doi.org/10.1186/s12885-021-08647-1
work_keys_str_mv AT limengxiang usingamachinelearningapproachtoidentifykeyprognosticmoleculesforesophagealsquamouscellcarcinoma
AT sunxiaomeng usingamachinelearningapproachtoidentifykeyprognosticmoleculesforesophagealsquamouscellcarcinoma
AT chengweigang usingamachinelearningapproachtoidentifykeyprognosticmoleculesforesophagealsquamouscellcarcinoma
AT ruanhaojie usingamachinelearningapproachtoidentifykeyprognosticmoleculesforesophagealsquamouscellcarcinoma
AT liuke usingamachinelearningapproachtoidentifykeyprognosticmoleculesforesophagealsquamouscellcarcinoma
AT chenpan usingamachinelearningapproachtoidentifykeyprognosticmoleculesforesophagealsquamouscellcarcinoma
AT xuhaijun usingamachinelearningapproachtoidentifykeyprognosticmoleculesforesophagealsquamouscellcarcinoma
AT gaoshegan usingamachinelearningapproachtoidentifykeyprognosticmoleculesforesophagealsquamouscellcarcinoma
AT fengxiaoshan usingamachinelearningapproachtoidentifykeyprognosticmoleculesforesophagealsquamouscellcarcinoma
AT qiyijun usingamachinelearningapproachtoidentifykeyprognosticmoleculesforesophagealsquamouscellcarcinoma