Cargando…

Developing prognostic gene panel of survival time in lung adenocarcinoma patients using machine learning

BACKGROUND: Transcriptome data generates massive amounts of information that can be used for characterization and prognosis of patient outcomes for many diseases. The goal of our research is to predict the survival time of lung adenocarcinoma patients and improve the accuracy of classifying the long...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Yidi, Yang, Mu, Sun, Weiwei, Zhang, Mingqiang, Sun, Jiao, Wang, Wenjuan, Tang, Dongqi, Yuan, Dongfeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: AME Publishing Company 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8799101/
https://www.ncbi.nlm.nih.gov/pubmed/35117753
http://dx.doi.org/10.21037/tcr-19-2739
_version_ 1784641986939060224
author Liu, Yidi
Yang, Mu
Sun, Weiwei
Zhang, Mingqiang
Sun, Jiao
Wang, Wenjuan
Tang, Dongqi
Yuan, Dongfeng
author_facet Liu, Yidi
Yang, Mu
Sun, Weiwei
Zhang, Mingqiang
Sun, Jiao
Wang, Wenjuan
Tang, Dongqi
Yuan, Dongfeng
author_sort Liu, Yidi
collection PubMed
description BACKGROUND: Transcriptome data generates massive amounts of information that can be used for characterization and prognosis of patient outcomes for many diseases. The goal of our research is to predict the survival time of lung adenocarcinoma patients and improve the accuracy of classifying the long-survival cohort and short-survival cohort. METHODS: We filtered prognostic features related with survival time of lung adenocarcinoma patients by the method of Relief and predicted whether survival time of the patient is >3 years or not—using eight machine learning algorithms (Support Vector Machines, Random Forests, Logistic Regression, Naïve Bayes, Linear Regression, Support Vector Regression (kernel Poly), Support Vector Regression (kernel Linear), and Ridge Regression). Then the best-performed algorithm was chosen to build a predictive model of survival time of lung adenocarcinoma patients. Further, another dataset was used to verify the stability and suitability of this model. We explored the underlying mechanisms of RNA expression changes with the corresponding DNA mutations and DNA methylation patterns in the 22 selected genetic features. RESULTS: The best machine learning algorithm was Naïve Bayes (accuracy=75%, AUC =0.81) using the top 22 genetic features, and this algorithm had the stable and great performance on another dataset as well. The coupled mutation number of the long-survival group (>6 years) was less than the short-survival group (<1 year) in 22 genes (P=0.031). CONCLUSIONS: The expression of gene panel can predict the survival time of lung adenocarcinoma patients using Naïve Bayes. These 22 genes do affect the survival time of lung adenocarcinoma.
format Online
Article
Text
id pubmed-8799101
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher AME Publishing Company
record_format MEDLINE/PubMed
spelling pubmed-87991012022-02-02 Developing prognostic gene panel of survival time in lung adenocarcinoma patients using machine learning Liu, Yidi Yang, Mu Sun, Weiwei Zhang, Mingqiang Sun, Jiao Wang, Wenjuan Tang, Dongqi Yuan, Dongfeng Transl Cancer Res Original Article BACKGROUND: Transcriptome data generates massive amounts of information that can be used for characterization and prognosis of patient outcomes for many diseases. The goal of our research is to predict the survival time of lung adenocarcinoma patients and improve the accuracy of classifying the long-survival cohort and short-survival cohort. METHODS: We filtered prognostic features related with survival time of lung adenocarcinoma patients by the method of Relief and predicted whether survival time of the patient is >3 years or not—using eight machine learning algorithms (Support Vector Machines, Random Forests, Logistic Regression, Naïve Bayes, Linear Regression, Support Vector Regression (kernel Poly), Support Vector Regression (kernel Linear), and Ridge Regression). Then the best-performed algorithm was chosen to build a predictive model of survival time of lung adenocarcinoma patients. Further, another dataset was used to verify the stability and suitability of this model. We explored the underlying mechanisms of RNA expression changes with the corresponding DNA mutations and DNA methylation patterns in the 22 selected genetic features. RESULTS: The best machine learning algorithm was Naïve Bayes (accuracy=75%, AUC =0.81) using the top 22 genetic features, and this algorithm had the stable and great performance on another dataset as well. The coupled mutation number of the long-survival group (>6 years) was less than the short-survival group (<1 year) in 22 genes (P=0.031). CONCLUSIONS: The expression of gene panel can predict the survival time of lung adenocarcinoma patients using Naïve Bayes. These 22 genes do affect the survival time of lung adenocarcinoma. AME Publishing Company 2020-06 /pmc/articles/PMC8799101/ /pubmed/35117753 http://dx.doi.org/10.21037/tcr-19-2739 Text en 2020 Translational Cancer Research. All rights reserved. https://creativecommons.org/licenses/by-nc-nd/4.0/Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
spellingShingle Original Article
Liu, Yidi
Yang, Mu
Sun, Weiwei
Zhang, Mingqiang
Sun, Jiao
Wang, Wenjuan
Tang, Dongqi
Yuan, Dongfeng
Developing prognostic gene panel of survival time in lung adenocarcinoma patients using machine learning
title Developing prognostic gene panel of survival time in lung adenocarcinoma patients using machine learning
title_full Developing prognostic gene panel of survival time in lung adenocarcinoma patients using machine learning
title_fullStr Developing prognostic gene panel of survival time in lung adenocarcinoma patients using machine learning
title_full_unstemmed Developing prognostic gene panel of survival time in lung adenocarcinoma patients using machine learning
title_short Developing prognostic gene panel of survival time in lung adenocarcinoma patients using machine learning
title_sort developing prognostic gene panel of survival time in lung adenocarcinoma patients using machine learning
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8799101/
https://www.ncbi.nlm.nih.gov/pubmed/35117753
http://dx.doi.org/10.21037/tcr-19-2739
work_keys_str_mv AT liuyidi developingprognosticgenepanelofsurvivaltimeinlungadenocarcinomapatientsusingmachinelearning
AT yangmu developingprognosticgenepanelofsurvivaltimeinlungadenocarcinomapatientsusingmachinelearning
AT sunweiwei developingprognosticgenepanelofsurvivaltimeinlungadenocarcinomapatientsusingmachinelearning
AT zhangmingqiang developingprognosticgenepanelofsurvivaltimeinlungadenocarcinomapatientsusingmachinelearning
AT sunjiao developingprognosticgenepanelofsurvivaltimeinlungadenocarcinomapatientsusingmachinelearning
AT wangwenjuan developingprognosticgenepanelofsurvivaltimeinlungadenocarcinomapatientsusingmachinelearning
AT tangdongqi developingprognosticgenepanelofsurvivaltimeinlungadenocarcinomapatientsusingmachinelearning
AT yuandongfeng developingprognosticgenepanelofsurvivaltimeinlungadenocarcinomapatientsusingmachinelearning