Cargando…

SurviveAI: Long Term Survival Prediction of Cancer Patients Based on Somatic RNA-Seq Expression

MOTIVATION: Prediction of cancer outcome is a major challenge in oncology and is essential for treatment planning. Repositories such as The Cancer Genome Atlas (TCGA) contain vast amounts of data for many types of cancers. Our goal was to create reliable prediction models using TCGA data and validat...

Descripción completa

Detalles Bibliográficos
Autores principales: Nayshool, Omri, Kol, Nitzan, Javaski, Elisheva, Amariglio, Ninette, Rechavi, Gideon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: SAGE Publications 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9549197/
https://www.ncbi.nlm.nih.gov/pubmed/36225330
http://dx.doi.org/10.1177/11769351221127875
_version_ 1784805614756560896
author Nayshool, Omri
Kol, Nitzan
Javaski, Elisheva
Amariglio, Ninette
Rechavi, Gideon
author_facet Nayshool, Omri
Kol, Nitzan
Javaski, Elisheva
Amariglio, Ninette
Rechavi, Gideon
author_sort Nayshool, Omri
collection PubMed
description MOTIVATION: Prediction of cancer outcome is a major challenge in oncology and is essential for treatment planning. Repositories such as The Cancer Genome Atlas (TCGA) contain vast amounts of data for many types of cancers. Our goal was to create reliable prediction models using TCGA data and validate them using an external dataset. RESULTS: For 16 TCGA cancer type cohorts we have optimized a Random Forest prediction model using parameter grid search followed by a backward feature elimination loop for dimensions reduction. For each feature that was removed, the model was retrained and the area under the curve of the receiver operating characteristic (AUC-ROC) was calculated using test data. Five prediction models gave AUC-ROC bigger than 80%. We used Clinical Proteomic Tumor Analysis Consortium v3 (CPTAC3) data for validation. The most enriched pathways for the top models were those involved in basic functions related to tumorigenesis and organ development. Enrichment for 2 prediction models of the TCGA-KIRP cohort was explored, one with 42 genes (AUC-ROC = 0.86) the other is composed of 300 genes (AUC-ROC = 0.85). The most enriched networks for both models share only 5 network nodes: DMBT1, IL11, HOXB6, TRIB3, PIM1. These genes play a significant role in renal cancer and might be used for prognosis prediction and as candidate therapeutic targets. AVAILABILITY AND IMPLEMENTATION: The prediction models were created and tested using Python SciKit-Learn package. They are freely accessible via a friendly web interface we called surviveAI at https://tinyurl.com/surviveai.
format Online
Article
Text
id pubmed-9549197
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher SAGE Publications
record_format MEDLINE/PubMed
spelling pubmed-95491972022-10-11 SurviveAI: Long Term Survival Prediction of Cancer Patients Based on Somatic RNA-Seq Expression Nayshool, Omri Kol, Nitzan Javaski, Elisheva Amariglio, Ninette Rechavi, Gideon Cancer Inform Software or Database Review MOTIVATION: Prediction of cancer outcome is a major challenge in oncology and is essential for treatment planning. Repositories such as The Cancer Genome Atlas (TCGA) contain vast amounts of data for many types of cancers. Our goal was to create reliable prediction models using TCGA data and validate them using an external dataset. RESULTS: For 16 TCGA cancer type cohorts we have optimized a Random Forest prediction model using parameter grid search followed by a backward feature elimination loop for dimensions reduction. For each feature that was removed, the model was retrained and the area under the curve of the receiver operating characteristic (AUC-ROC) was calculated using test data. Five prediction models gave AUC-ROC bigger than 80%. We used Clinical Proteomic Tumor Analysis Consortium v3 (CPTAC3) data for validation. The most enriched pathways for the top models were those involved in basic functions related to tumorigenesis and organ development. Enrichment for 2 prediction models of the TCGA-KIRP cohort was explored, one with 42 genes (AUC-ROC = 0.86) the other is composed of 300 genes (AUC-ROC = 0.85). The most enriched networks for both models share only 5 network nodes: DMBT1, IL11, HOXB6, TRIB3, PIM1. These genes play a significant role in renal cancer and might be used for prognosis prediction and as candidate therapeutic targets. AVAILABILITY AND IMPLEMENTATION: The prediction models were created and tested using Python SciKit-Learn package. They are freely accessible via a friendly web interface we called surviveAI at https://tinyurl.com/surviveai. SAGE Publications 2022-10-07 /pmc/articles/PMC9549197/ /pubmed/36225330 http://dx.doi.org/10.1177/11769351221127875 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by-nc/4.0/This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (https://creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle Software or Database Review
Nayshool, Omri
Kol, Nitzan
Javaski, Elisheva
Amariglio, Ninette
Rechavi, Gideon
SurviveAI: Long Term Survival Prediction of Cancer Patients Based on Somatic RNA-Seq Expression
title SurviveAI: Long Term Survival Prediction of Cancer Patients Based on Somatic RNA-Seq Expression
title_full SurviveAI: Long Term Survival Prediction of Cancer Patients Based on Somatic RNA-Seq Expression
title_fullStr SurviveAI: Long Term Survival Prediction of Cancer Patients Based on Somatic RNA-Seq Expression
title_full_unstemmed SurviveAI: Long Term Survival Prediction of Cancer Patients Based on Somatic RNA-Seq Expression
title_short SurviveAI: Long Term Survival Prediction of Cancer Patients Based on Somatic RNA-Seq Expression
title_sort surviveai: long term survival prediction of cancer patients based on somatic rna-seq expression
topic Software or Database Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9549197/
https://www.ncbi.nlm.nih.gov/pubmed/36225330
http://dx.doi.org/10.1177/11769351221127875
work_keys_str_mv AT nayshoolomri surviveailongtermsurvivalpredictionofcancerpatientsbasedonsomaticrnaseqexpression
AT kolnitzan surviveailongtermsurvivalpredictionofcancerpatientsbasedonsomaticrnaseqexpression
AT javaskielisheva surviveailongtermsurvivalpredictionofcancerpatientsbasedonsomaticrnaseqexpression
AT amariglioninette surviveailongtermsurvivalpredictionofcancerpatientsbasedonsomaticrnaseqexpression
AT rechavigideon surviveailongtermsurvivalpredictionofcancerpatientsbasedonsomaticrnaseqexpression