Cargando…
A cross-cohort computational framework to trace tumor tissue-of-origin based on RNA sequencing
Carcinoma of unknown primary (CUP) is a type of metastatic cancer with tissue-of-origin (TOO) unidentifiable by traditional methods. CUP patients typically have poor prognosis but therapy targeting the original cancer tissue can significantly improve patients’ prognosis. Thus, it’s critical to devel...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10505149/ https://www.ncbi.nlm.nih.gov/pubmed/37717102 http://dx.doi.org/10.1038/s41598-023-42465-8 |
_version_ | 1785106858798743552 |
---|---|
author | He, Binsheng Sun, Hongmei Bao, Meihua Li, Haigang He, Jianjun Tian, Geng Wang, Bo |
author_facet | He, Binsheng Sun, Hongmei Bao, Meihua Li, Haigang He, Jianjun Tian, Geng Wang, Bo |
author_sort | He, Binsheng |
collection | PubMed |
description | Carcinoma of unknown primary (CUP) is a type of metastatic cancer with tissue-of-origin (TOO) unidentifiable by traditional methods. CUP patients typically have poor prognosis but therapy targeting the original cancer tissue can significantly improve patients’ prognosis. Thus, it’s critical to develop accurate computational methods to infer cancer TOO. While qPCR or microarray-based methods are effective in inferring TOO for most cancer types, the overall prediction accuracy is yet to be improved. In this study, we propose a cross-cohort computational framework to trace TOO of 32 cancer types based on RNA sequencing (RNA-seq). Specifically, we employed logistic regression models to select 80 genes for each cancer type to create a combined 1356-gene set, based on transcriptomic data from 9911 tissue samples covering the 32 cancer types with known TOO from the Cancer Genome Atlas (TCGA). The selected genes are enriched in both tissue-specific and tissue-general functions. The cross-validation accuracy of our framework reaches 97.50% across all cancer types. Furthermore, we tested the performance of our model on the TCGA metastatic dataset and International Cancer Genome Consortium (ICGC) dataset, achieving an accuracy of 91.09% and 82.67%, respectively, despite the differences in experiment procedures and pipelines. In conclusion, we developed an accurate yet robust computational framework for identifying TOO, which holds promise for clinical applications. Our code is available at http://github.com/wangbo00129/classifybysklearn. |
format | Online Article Text |
id | pubmed-10505149 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-105051492023-09-18 A cross-cohort computational framework to trace tumor tissue-of-origin based on RNA sequencing He, Binsheng Sun, Hongmei Bao, Meihua Li, Haigang He, Jianjun Tian, Geng Wang, Bo Sci Rep Article Carcinoma of unknown primary (CUP) is a type of metastatic cancer with tissue-of-origin (TOO) unidentifiable by traditional methods. CUP patients typically have poor prognosis but therapy targeting the original cancer tissue can significantly improve patients’ prognosis. Thus, it’s critical to develop accurate computational methods to infer cancer TOO. While qPCR or microarray-based methods are effective in inferring TOO for most cancer types, the overall prediction accuracy is yet to be improved. In this study, we propose a cross-cohort computational framework to trace TOO of 32 cancer types based on RNA sequencing (RNA-seq). Specifically, we employed logistic regression models to select 80 genes for each cancer type to create a combined 1356-gene set, based on transcriptomic data from 9911 tissue samples covering the 32 cancer types with known TOO from the Cancer Genome Atlas (TCGA). The selected genes are enriched in both tissue-specific and tissue-general functions. The cross-validation accuracy of our framework reaches 97.50% across all cancer types. Furthermore, we tested the performance of our model on the TCGA metastatic dataset and International Cancer Genome Consortium (ICGC) dataset, achieving an accuracy of 91.09% and 82.67%, respectively, despite the differences in experiment procedures and pipelines. In conclusion, we developed an accurate yet robust computational framework for identifying TOO, which holds promise for clinical applications. Our code is available at http://github.com/wangbo00129/classifybysklearn. Nature Publishing Group UK 2023-09-16 /pmc/articles/PMC10505149/ /pubmed/37717102 http://dx.doi.org/10.1038/s41598-023-42465-8 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article He, Binsheng Sun, Hongmei Bao, Meihua Li, Haigang He, Jianjun Tian, Geng Wang, Bo A cross-cohort computational framework to trace tumor tissue-of-origin based on RNA sequencing |
title | A cross-cohort computational framework to trace tumor tissue-of-origin based on RNA sequencing |
title_full | A cross-cohort computational framework to trace tumor tissue-of-origin based on RNA sequencing |
title_fullStr | A cross-cohort computational framework to trace tumor tissue-of-origin based on RNA sequencing |
title_full_unstemmed | A cross-cohort computational framework to trace tumor tissue-of-origin based on RNA sequencing |
title_short | A cross-cohort computational framework to trace tumor tissue-of-origin based on RNA sequencing |
title_sort | cross-cohort computational framework to trace tumor tissue-of-origin based on rna sequencing |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10505149/ https://www.ncbi.nlm.nih.gov/pubmed/37717102 http://dx.doi.org/10.1038/s41598-023-42465-8 |
work_keys_str_mv | AT hebinsheng acrosscohortcomputationalframeworktotracetumortissueoforiginbasedonrnasequencing AT sunhongmei acrosscohortcomputationalframeworktotracetumortissueoforiginbasedonrnasequencing AT baomeihua acrosscohortcomputationalframeworktotracetumortissueoforiginbasedonrnasequencing AT lihaigang acrosscohortcomputationalframeworktotracetumortissueoforiginbasedonrnasequencing AT hejianjun acrosscohortcomputationalframeworktotracetumortissueoforiginbasedonrnasequencing AT tiangeng acrosscohortcomputationalframeworktotracetumortissueoforiginbasedonrnasequencing AT wangbo acrosscohortcomputationalframeworktotracetumortissueoforiginbasedonrnasequencing AT hebinsheng crosscohortcomputationalframeworktotracetumortissueoforiginbasedonrnasequencing AT sunhongmei crosscohortcomputationalframeworktotracetumortissueoforiginbasedonrnasequencing AT baomeihua crosscohortcomputationalframeworktotracetumortissueoforiginbasedonrnasequencing AT lihaigang crosscohortcomputationalframeworktotracetumortissueoforiginbasedonrnasequencing AT hejianjun crosscohortcomputationalframeworktotracetumortissueoforiginbasedonrnasequencing AT tiangeng crosscohortcomputationalframeworktotracetumortissueoforiginbasedonrnasequencing AT wangbo crosscohortcomputationalframeworktotracetumortissueoforiginbasedonrnasequencing |