Cargando…

Exploration of predictive and prognostic alternative splicing signatures in lung adenocarcinoma using machine learning methods

BACKGROUND: Alternative splicing (AS) plays critical roles in generating protein diversity and complexity. Dysregulation of AS underlies the initiation and progression of tumors. Machine learning approaches have emerged as efficient tools to identify promising biomarkers. It is meaningful to explore...

Descripción completa

Detalles Bibliográficos
Autores principales: Cai, Qidong, He, Boxue, Zhang, Pengfei, Zhao, Zhenyu, Peng, Xiong, Zhang, Yuqian, Xie, Hui, Wang, Xiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7720605/
https://www.ncbi.nlm.nih.gov/pubmed/33287830
http://dx.doi.org/10.1186/s12967-020-02635-y
_version_ 1783619883473305600
author Cai, Qidong
He, Boxue
Zhang, Pengfei
Zhao, Zhenyu
Peng, Xiong
Zhang, Yuqian
Xie, Hui
Wang, Xiang
author_facet Cai, Qidong
He, Boxue
Zhang, Pengfei
Zhao, Zhenyu
Peng, Xiong
Zhang, Yuqian
Xie, Hui
Wang, Xiang
author_sort Cai, Qidong
collection PubMed
description BACKGROUND: Alternative splicing (AS) plays critical roles in generating protein diversity and complexity. Dysregulation of AS underlies the initiation and progression of tumors. Machine learning approaches have emerged as efficient tools to identify promising biomarkers. It is meaningful to explore pivotal AS events (ASEs) to deepen understanding and improve prognostic assessments of lung adenocarcinoma (LUAD) via machine learning algorithms. METHOD: RNA sequencing data and AS data were extracted from The Cancer Genome Atlas (TCGA) database and TCGA SpliceSeq database. Using several machine learning methods, we identified 24 pairs of LUAD-related ASEs implicated in splicing switches and a random forest-based classifiers for identifying lymph node metastasis (LNM) consisting of 12 ASEs. Furthermore, we identified key prognosis-related ASEs and established a 16-ASE-based prognostic model to predict overall survival for LUAD patients using Cox regression model, random survival forest analysis, and forward selection model. Bioinformatics analyses were also applied to identify underlying mechanisms and associated upstream splicing factors (SFs). RESULTS: Each pair of ASEs was spliced from the same parent gene, and exhibited perfect inverse intrapair correlation (correlation coefficient = − 1). The 12-ASE-based classifier showed robust ability to evaluate LNM status of LUAD patients with the area under the receiver operating characteristic (ROC) curve (AUC) more than 0.7 in fivefold cross-validation. The prognostic model performed well at 1, 3, 5, and 10 years in both the training cohort and internal test cohort. Univariate and multivariate Cox regression indicated the prognostic model could be used as an independent prognostic factor for patients with LUAD. Further analysis revealed correlations between the prognostic model and American Joint Committee on Cancer stage, T stage, N stage, and living status. The splicing network constructed of survival-related SFs and ASEs depicts regulatory relationships between them. CONCLUSION: In summary, our study provides insight into LUAD researches and managements based on these AS biomarkers.
format Online
Article
Text
id pubmed-7720605
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-77206052020-12-08 Exploration of predictive and prognostic alternative splicing signatures in lung adenocarcinoma using machine learning methods Cai, Qidong He, Boxue Zhang, Pengfei Zhao, Zhenyu Peng, Xiong Zhang, Yuqian Xie, Hui Wang, Xiang J Transl Med Research BACKGROUND: Alternative splicing (AS) plays critical roles in generating protein diversity and complexity. Dysregulation of AS underlies the initiation and progression of tumors. Machine learning approaches have emerged as efficient tools to identify promising biomarkers. It is meaningful to explore pivotal AS events (ASEs) to deepen understanding and improve prognostic assessments of lung adenocarcinoma (LUAD) via machine learning algorithms. METHOD: RNA sequencing data and AS data were extracted from The Cancer Genome Atlas (TCGA) database and TCGA SpliceSeq database. Using several machine learning methods, we identified 24 pairs of LUAD-related ASEs implicated in splicing switches and a random forest-based classifiers for identifying lymph node metastasis (LNM) consisting of 12 ASEs. Furthermore, we identified key prognosis-related ASEs and established a 16-ASE-based prognostic model to predict overall survival for LUAD patients using Cox regression model, random survival forest analysis, and forward selection model. Bioinformatics analyses were also applied to identify underlying mechanisms and associated upstream splicing factors (SFs). RESULTS: Each pair of ASEs was spliced from the same parent gene, and exhibited perfect inverse intrapair correlation (correlation coefficient = − 1). The 12-ASE-based classifier showed robust ability to evaluate LNM status of LUAD patients with the area under the receiver operating characteristic (ROC) curve (AUC) more than 0.7 in fivefold cross-validation. The prognostic model performed well at 1, 3, 5, and 10 years in both the training cohort and internal test cohort. Univariate and multivariate Cox regression indicated the prognostic model could be used as an independent prognostic factor for patients with LUAD. Further analysis revealed correlations between the prognostic model and American Joint Committee on Cancer stage, T stage, N stage, and living status. The splicing network constructed of survival-related SFs and ASEs depicts regulatory relationships between them. CONCLUSION: In summary, our study provides insight into LUAD researches and managements based on these AS biomarkers. BioMed Central 2020-12-07 /pmc/articles/PMC7720605/ /pubmed/33287830 http://dx.doi.org/10.1186/s12967-020-02635-y Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Cai, Qidong
He, Boxue
Zhang, Pengfei
Zhao, Zhenyu
Peng, Xiong
Zhang, Yuqian
Xie, Hui
Wang, Xiang
Exploration of predictive and prognostic alternative splicing signatures in lung adenocarcinoma using machine learning methods
title Exploration of predictive and prognostic alternative splicing signatures in lung adenocarcinoma using machine learning methods
title_full Exploration of predictive and prognostic alternative splicing signatures in lung adenocarcinoma using machine learning methods
title_fullStr Exploration of predictive and prognostic alternative splicing signatures in lung adenocarcinoma using machine learning methods
title_full_unstemmed Exploration of predictive and prognostic alternative splicing signatures in lung adenocarcinoma using machine learning methods
title_short Exploration of predictive and prognostic alternative splicing signatures in lung adenocarcinoma using machine learning methods
title_sort exploration of predictive and prognostic alternative splicing signatures in lung adenocarcinoma using machine learning methods
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7720605/
https://www.ncbi.nlm.nih.gov/pubmed/33287830
http://dx.doi.org/10.1186/s12967-020-02635-y
work_keys_str_mv AT caiqidong explorationofpredictiveandprognosticalternativesplicingsignaturesinlungadenocarcinomausingmachinelearningmethods
AT heboxue explorationofpredictiveandprognosticalternativesplicingsignaturesinlungadenocarcinomausingmachinelearningmethods
AT zhangpengfei explorationofpredictiveandprognosticalternativesplicingsignaturesinlungadenocarcinomausingmachinelearningmethods
AT zhaozhenyu explorationofpredictiveandprognosticalternativesplicingsignaturesinlungadenocarcinomausingmachinelearningmethods
AT pengxiong explorationofpredictiveandprognosticalternativesplicingsignaturesinlungadenocarcinomausingmachinelearningmethods
AT zhangyuqian explorationofpredictiveandprognosticalternativesplicingsignaturesinlungadenocarcinomausingmachinelearningmethods
AT xiehui explorationofpredictiveandprognosticalternativesplicingsignaturesinlungadenocarcinomausingmachinelearningmethods
AT wangxiang explorationofpredictiveandprognosticalternativesplicingsignaturesinlungadenocarcinomausingmachinelearningmethods