Cargando…
Identification of a Sixteen-gene Prognostic Biomarker for Lung Adenocarcinoma Using a Machine Learning Method
Objectives: Lung adenocarcinoma (LUAD) accounts for a majority of cancer-related deaths worldwide annually. The identification of prognostic biomarkers and prediction of prognosis for LUAD patients is necessary. Materials and Methods: In this study, LUAD RNA-Seq data and clinical data from the Cance...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Ivyspring International Publisher
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6959071/ https://www.ncbi.nlm.nih.gov/pubmed/31956375 http://dx.doi.org/10.7150/jca.34585 |
_version_ | 1783487529304981504 |
---|---|
author | Ma, Baoshan Geng, Yao Meng, Fanyu Yan, Ge Song, Fengju |
author_facet | Ma, Baoshan Geng, Yao Meng, Fanyu Yan, Ge Song, Fengju |
author_sort | Ma, Baoshan |
collection | PubMed |
description | Objectives: Lung adenocarcinoma (LUAD) accounts for a majority of cancer-related deaths worldwide annually. The identification of prognostic biomarkers and prediction of prognosis for LUAD patients is necessary. Materials and Methods: In this study, LUAD RNA-Seq data and clinical data from the Cancer Genome Atlas (TCGA) were divided into TCGA cohort I (n = 338) and II (n = 168). The cohort I was used for model construction, and the cohort II and data from Gene Expression Omnibus (GSE72094 cohort, n = 393; GSE11969 cohort, n = 149) were utilized for validation. First, the survival-related seed genes were selected from the cohort I using the machine learning model (random survival forest, RSF), and then in order to improve prediction accuracy, the forward selection model was utilized to identify the prognosis-related key genes among the seed genes using the clinically-integrated RNA-Seq data. Second, the survival risk score system was constructed by using these key genes in the cohort II, the GSE72094 cohort and the GSE11969 cohort, and the evaluation metrics such as HR, p value and C-index were calculated to validate the proposed method. Third, the developed approach was compared with the previous five prediction models. Finally, bioinformatics analyses (pathway, heatmap, protein-gene interaction network) have been applied to the identified seed genes and key genes. Results and Conclusion: Based on the RSF model and clinically-integrated RNA-Seq data, we identified sixteen key genes that formed the prognostic gene expression signature. These sixteen key genes could achieve a strong power for prognostic prediction of LUAD patients in cohort II (HR = 3.80, p = 1.63e-06, C-index = 0.656), and were further validated in the GSE72094 cohort (HR = 4.12, p = 1.34e-10, C-index = 0.672) and GSE11969 cohort (HR = 3.87, p = 6.81e-07, C-index = 0.670). The experimental results of three independent validation cohorts showed that compared with the traditional Cox model and the use of standalone RNA-Seq data, the machine-learning-based method effectively improved the prediction accuracy of LUAD prognosis, and the derived model was also superior to the other five existing prediction models. KEGG pathway analysis found eleven of the sixteen genes were associated with Nicotine addiction. Thirteen of the sixteen genes were reported for the first time as the LUAD prognosis-related key genes. In conclusion, we developed a sixteen-gene prognostic marker for LUAD, which may provide a powerful prognostic tool for precision oncology. |
format | Online Article Text |
id | pubmed-6959071 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Ivyspring International Publisher |
record_format | MEDLINE/PubMed |
spelling | pubmed-69590712020-01-18 Identification of a Sixteen-gene Prognostic Biomarker for Lung Adenocarcinoma Using a Machine Learning Method Ma, Baoshan Geng, Yao Meng, Fanyu Yan, Ge Song, Fengju J Cancer Research Paper Objectives: Lung adenocarcinoma (LUAD) accounts for a majority of cancer-related deaths worldwide annually. The identification of prognostic biomarkers and prediction of prognosis for LUAD patients is necessary. Materials and Methods: In this study, LUAD RNA-Seq data and clinical data from the Cancer Genome Atlas (TCGA) were divided into TCGA cohort I (n = 338) and II (n = 168). The cohort I was used for model construction, and the cohort II and data from Gene Expression Omnibus (GSE72094 cohort, n = 393; GSE11969 cohort, n = 149) were utilized for validation. First, the survival-related seed genes were selected from the cohort I using the machine learning model (random survival forest, RSF), and then in order to improve prediction accuracy, the forward selection model was utilized to identify the prognosis-related key genes among the seed genes using the clinically-integrated RNA-Seq data. Second, the survival risk score system was constructed by using these key genes in the cohort II, the GSE72094 cohort and the GSE11969 cohort, and the evaluation metrics such as HR, p value and C-index were calculated to validate the proposed method. Third, the developed approach was compared with the previous five prediction models. Finally, bioinformatics analyses (pathway, heatmap, protein-gene interaction network) have been applied to the identified seed genes and key genes. Results and Conclusion: Based on the RSF model and clinically-integrated RNA-Seq data, we identified sixteen key genes that formed the prognostic gene expression signature. These sixteen key genes could achieve a strong power for prognostic prediction of LUAD patients in cohort II (HR = 3.80, p = 1.63e-06, C-index = 0.656), and were further validated in the GSE72094 cohort (HR = 4.12, p = 1.34e-10, C-index = 0.672) and GSE11969 cohort (HR = 3.87, p = 6.81e-07, C-index = 0.670). The experimental results of three independent validation cohorts showed that compared with the traditional Cox model and the use of standalone RNA-Seq data, the machine-learning-based method effectively improved the prediction accuracy of LUAD prognosis, and the derived model was also superior to the other five existing prediction models. KEGG pathway analysis found eleven of the sixteen genes were associated with Nicotine addiction. Thirteen of the sixteen genes were reported for the first time as the LUAD prognosis-related key genes. In conclusion, we developed a sixteen-gene prognostic marker for LUAD, which may provide a powerful prognostic tool for precision oncology. Ivyspring International Publisher 2020-01-01 /pmc/articles/PMC6959071/ /pubmed/31956375 http://dx.doi.org/10.7150/jca.34585 Text en © The author(s) This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/). See http://ivyspring.com/terms for full terms and conditions. |
spellingShingle | Research Paper Ma, Baoshan Geng, Yao Meng, Fanyu Yan, Ge Song, Fengju Identification of a Sixteen-gene Prognostic Biomarker for Lung Adenocarcinoma Using a Machine Learning Method |
title | Identification of a Sixteen-gene Prognostic Biomarker for Lung Adenocarcinoma Using a Machine Learning Method |
title_full | Identification of a Sixteen-gene Prognostic Biomarker for Lung Adenocarcinoma Using a Machine Learning Method |
title_fullStr | Identification of a Sixteen-gene Prognostic Biomarker for Lung Adenocarcinoma Using a Machine Learning Method |
title_full_unstemmed | Identification of a Sixteen-gene Prognostic Biomarker for Lung Adenocarcinoma Using a Machine Learning Method |
title_short | Identification of a Sixteen-gene Prognostic Biomarker for Lung Adenocarcinoma Using a Machine Learning Method |
title_sort | identification of a sixteen-gene prognostic biomarker for lung adenocarcinoma using a machine learning method |
topic | Research Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6959071/ https://www.ncbi.nlm.nih.gov/pubmed/31956375 http://dx.doi.org/10.7150/jca.34585 |
work_keys_str_mv | AT mabaoshan identificationofasixteengeneprognosticbiomarkerforlungadenocarcinomausingamachinelearningmethod AT gengyao identificationofasixteengeneprognosticbiomarkerforlungadenocarcinomausingamachinelearningmethod AT mengfanyu identificationofasixteengeneprognosticbiomarkerforlungadenocarcinomausingamachinelearningmethod AT yange identificationofasixteengeneprognosticbiomarkerforlungadenocarcinomausingamachinelearningmethod AT songfengju identificationofasixteengeneprognosticbiomarkerforlungadenocarcinomausingamachinelearningmethod |