Cargando…
SurviveAI: Long Term Survival Prediction of Cancer Patients Based on Somatic RNA-Seq Expression
MOTIVATION: Prediction of cancer outcome is a major challenge in oncology and is essential for treatment planning. Repositories such as The Cancer Genome Atlas (TCGA) contain vast amounts of data for many types of cancers. Our goal was to create reliable prediction models using TCGA data and validat...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
SAGE Publications
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9549197/ https://www.ncbi.nlm.nih.gov/pubmed/36225330 http://dx.doi.org/10.1177/11769351221127875 |
_version_ | 1784805614756560896 |
---|---|
author | Nayshool, Omri Kol, Nitzan Javaski, Elisheva Amariglio, Ninette Rechavi, Gideon |
author_facet | Nayshool, Omri Kol, Nitzan Javaski, Elisheva Amariglio, Ninette Rechavi, Gideon |
author_sort | Nayshool, Omri |
collection | PubMed |
description | MOTIVATION: Prediction of cancer outcome is a major challenge in oncology and is essential for treatment planning. Repositories such as The Cancer Genome Atlas (TCGA) contain vast amounts of data for many types of cancers. Our goal was to create reliable prediction models using TCGA data and validate them using an external dataset. RESULTS: For 16 TCGA cancer type cohorts we have optimized a Random Forest prediction model using parameter grid search followed by a backward feature elimination loop for dimensions reduction. For each feature that was removed, the model was retrained and the area under the curve of the receiver operating characteristic (AUC-ROC) was calculated using test data. Five prediction models gave AUC-ROC bigger than 80%. We used Clinical Proteomic Tumor Analysis Consortium v3 (CPTAC3) data for validation. The most enriched pathways for the top models were those involved in basic functions related to tumorigenesis and organ development. Enrichment for 2 prediction models of the TCGA-KIRP cohort was explored, one with 42 genes (AUC-ROC = 0.86) the other is composed of 300 genes (AUC-ROC = 0.85). The most enriched networks for both models share only 5 network nodes: DMBT1, IL11, HOXB6, TRIB3, PIM1. These genes play a significant role in renal cancer and might be used for prognosis prediction and as candidate therapeutic targets. AVAILABILITY AND IMPLEMENTATION: The prediction models were created and tested using Python SciKit-Learn package. They are freely accessible via a friendly web interface we called surviveAI at https://tinyurl.com/surviveai. |
format | Online Article Text |
id | pubmed-9549197 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | SAGE Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-95491972022-10-11 SurviveAI: Long Term Survival Prediction of Cancer Patients Based on Somatic RNA-Seq Expression Nayshool, Omri Kol, Nitzan Javaski, Elisheva Amariglio, Ninette Rechavi, Gideon Cancer Inform Software or Database Review MOTIVATION: Prediction of cancer outcome is a major challenge in oncology and is essential for treatment planning. Repositories such as The Cancer Genome Atlas (TCGA) contain vast amounts of data for many types of cancers. Our goal was to create reliable prediction models using TCGA data and validate them using an external dataset. RESULTS: For 16 TCGA cancer type cohorts we have optimized a Random Forest prediction model using parameter grid search followed by a backward feature elimination loop for dimensions reduction. For each feature that was removed, the model was retrained and the area under the curve of the receiver operating characteristic (AUC-ROC) was calculated using test data. Five prediction models gave AUC-ROC bigger than 80%. We used Clinical Proteomic Tumor Analysis Consortium v3 (CPTAC3) data for validation. The most enriched pathways for the top models were those involved in basic functions related to tumorigenesis and organ development. Enrichment for 2 prediction models of the TCGA-KIRP cohort was explored, one with 42 genes (AUC-ROC = 0.86) the other is composed of 300 genes (AUC-ROC = 0.85). The most enriched networks for both models share only 5 network nodes: DMBT1, IL11, HOXB6, TRIB3, PIM1. These genes play a significant role in renal cancer and might be used for prognosis prediction and as candidate therapeutic targets. AVAILABILITY AND IMPLEMENTATION: The prediction models were created and tested using Python SciKit-Learn package. They are freely accessible via a friendly web interface we called surviveAI at https://tinyurl.com/surviveai. SAGE Publications 2022-10-07 /pmc/articles/PMC9549197/ /pubmed/36225330 http://dx.doi.org/10.1177/11769351221127875 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by-nc/4.0/This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (https://creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage). |
spellingShingle | Software or Database Review Nayshool, Omri Kol, Nitzan Javaski, Elisheva Amariglio, Ninette Rechavi, Gideon SurviveAI: Long Term Survival Prediction of Cancer Patients Based on Somatic RNA-Seq Expression |
title | SurviveAI: Long Term Survival Prediction of Cancer Patients Based on
Somatic RNA-Seq Expression |
title_full | SurviveAI: Long Term Survival Prediction of Cancer Patients Based on
Somatic RNA-Seq Expression |
title_fullStr | SurviveAI: Long Term Survival Prediction of Cancer Patients Based on
Somatic RNA-Seq Expression |
title_full_unstemmed | SurviveAI: Long Term Survival Prediction of Cancer Patients Based on
Somatic RNA-Seq Expression |
title_short | SurviveAI: Long Term Survival Prediction of Cancer Patients Based on
Somatic RNA-Seq Expression |
title_sort | surviveai: long term survival prediction of cancer patients based on
somatic rna-seq expression |
topic | Software or Database Review |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9549197/ https://www.ncbi.nlm.nih.gov/pubmed/36225330 http://dx.doi.org/10.1177/11769351221127875 |
work_keys_str_mv | AT nayshoolomri surviveailongtermsurvivalpredictionofcancerpatientsbasedonsomaticrnaseqexpression AT kolnitzan surviveailongtermsurvivalpredictionofcancerpatientsbasedonsomaticrnaseqexpression AT javaskielisheva surviveailongtermsurvivalpredictionofcancerpatientsbasedonsomaticrnaseqexpression AT amariglioninette surviveailongtermsurvivalpredictionofcancerpatientsbasedonsomaticrnaseqexpression AT rechavigideon surviveailongtermsurvivalpredictionofcancerpatientsbasedonsomaticrnaseqexpression |