Cargando…

A cross-cohort computational framework to trace tumor tissue-of-origin based on RNA sequencing

Carcinoma of unknown primary (CUP) is a type of metastatic cancer with tissue-of-origin (TOO) unidentifiable by traditional methods. CUP patients typically have poor prognosis but therapy targeting the original cancer tissue can significantly improve patients’ prognosis. Thus, it’s critical to devel...

Descripción completa

Detalles Bibliográficos
Autores principales: He, Binsheng, Sun, Hongmei, Bao, Meihua, Li, Haigang, He, Jianjun, Tian, Geng, Wang, Bo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10505149/
https://www.ncbi.nlm.nih.gov/pubmed/37717102
http://dx.doi.org/10.1038/s41598-023-42465-8
_version_ 1785106858798743552
author He, Binsheng
Sun, Hongmei
Bao, Meihua
Li, Haigang
He, Jianjun
Tian, Geng
Wang, Bo
author_facet He, Binsheng
Sun, Hongmei
Bao, Meihua
Li, Haigang
He, Jianjun
Tian, Geng
Wang, Bo
author_sort He, Binsheng
collection PubMed
description Carcinoma of unknown primary (CUP) is a type of metastatic cancer with tissue-of-origin (TOO) unidentifiable by traditional methods. CUP patients typically have poor prognosis but therapy targeting the original cancer tissue can significantly improve patients’ prognosis. Thus, it’s critical to develop accurate computational methods to infer cancer TOO. While qPCR or microarray-based methods are effective in inferring TOO for most cancer types, the overall prediction accuracy is yet to be improved. In this study, we propose a cross-cohort computational framework to trace TOO of 32 cancer types based on RNA sequencing (RNA-seq). Specifically, we employed logistic regression models to select 80 genes for each cancer type to create a combined 1356-gene set, based on transcriptomic data from 9911 tissue samples covering the 32 cancer types with known TOO from the Cancer Genome Atlas (TCGA). The selected genes are enriched in both tissue-specific and tissue-general functions. The cross-validation accuracy of our framework reaches 97.50% across all cancer types. Furthermore, we tested the performance of our model on the TCGA metastatic dataset and International Cancer Genome Consortium (ICGC) dataset, achieving an accuracy of 91.09% and 82.67%, respectively, despite the differences in experiment procedures and pipelines. In conclusion, we developed an accurate yet robust computational framework for identifying TOO, which holds promise for clinical applications. Our code is available at http://github.com/wangbo00129/classifybysklearn.
format Online
Article
Text
id pubmed-10505149
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-105051492023-09-18 A cross-cohort computational framework to trace tumor tissue-of-origin based on RNA sequencing He, Binsheng Sun, Hongmei Bao, Meihua Li, Haigang He, Jianjun Tian, Geng Wang, Bo Sci Rep Article Carcinoma of unknown primary (CUP) is a type of metastatic cancer with tissue-of-origin (TOO) unidentifiable by traditional methods. CUP patients typically have poor prognosis but therapy targeting the original cancer tissue can significantly improve patients’ prognosis. Thus, it’s critical to develop accurate computational methods to infer cancer TOO. While qPCR or microarray-based methods are effective in inferring TOO for most cancer types, the overall prediction accuracy is yet to be improved. In this study, we propose a cross-cohort computational framework to trace TOO of 32 cancer types based on RNA sequencing (RNA-seq). Specifically, we employed logistic regression models to select 80 genes for each cancer type to create a combined 1356-gene set, based on transcriptomic data from 9911 tissue samples covering the 32 cancer types with known TOO from the Cancer Genome Atlas (TCGA). The selected genes are enriched in both tissue-specific and tissue-general functions. The cross-validation accuracy of our framework reaches 97.50% across all cancer types. Furthermore, we tested the performance of our model on the TCGA metastatic dataset and International Cancer Genome Consortium (ICGC) dataset, achieving an accuracy of 91.09% and 82.67%, respectively, despite the differences in experiment procedures and pipelines. In conclusion, we developed an accurate yet robust computational framework for identifying TOO, which holds promise for clinical applications. Our code is available at http://github.com/wangbo00129/classifybysklearn. Nature Publishing Group UK 2023-09-16 /pmc/articles/PMC10505149/ /pubmed/37717102 http://dx.doi.org/10.1038/s41598-023-42465-8 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
He, Binsheng
Sun, Hongmei
Bao, Meihua
Li, Haigang
He, Jianjun
Tian, Geng
Wang, Bo
A cross-cohort computational framework to trace tumor tissue-of-origin based on RNA sequencing
title A cross-cohort computational framework to trace tumor tissue-of-origin based on RNA sequencing
title_full A cross-cohort computational framework to trace tumor tissue-of-origin based on RNA sequencing
title_fullStr A cross-cohort computational framework to trace tumor tissue-of-origin based on RNA sequencing
title_full_unstemmed A cross-cohort computational framework to trace tumor tissue-of-origin based on RNA sequencing
title_short A cross-cohort computational framework to trace tumor tissue-of-origin based on RNA sequencing
title_sort cross-cohort computational framework to trace tumor tissue-of-origin based on rna sequencing
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10505149/
https://www.ncbi.nlm.nih.gov/pubmed/37717102
http://dx.doi.org/10.1038/s41598-023-42465-8
work_keys_str_mv AT hebinsheng acrosscohortcomputationalframeworktotracetumortissueoforiginbasedonrnasequencing
AT sunhongmei acrosscohortcomputationalframeworktotracetumortissueoforiginbasedonrnasequencing
AT baomeihua acrosscohortcomputationalframeworktotracetumortissueoforiginbasedonrnasequencing
AT lihaigang acrosscohortcomputationalframeworktotracetumortissueoforiginbasedonrnasequencing
AT hejianjun acrosscohortcomputationalframeworktotracetumortissueoforiginbasedonrnasequencing
AT tiangeng acrosscohortcomputationalframeworktotracetumortissueoforiginbasedonrnasequencing
AT wangbo acrosscohortcomputationalframeworktotracetumortissueoforiginbasedonrnasequencing
AT hebinsheng crosscohortcomputationalframeworktotracetumortissueoforiginbasedonrnasequencing
AT sunhongmei crosscohortcomputationalframeworktotracetumortissueoforiginbasedonrnasequencing
AT baomeihua crosscohortcomputationalframeworktotracetumortissueoforiginbasedonrnasequencing
AT lihaigang crosscohortcomputationalframeworktotracetumortissueoforiginbasedonrnasequencing
AT hejianjun crosscohortcomputationalframeworktotracetumortissueoforiginbasedonrnasequencing
AT tiangeng crosscohortcomputationalframeworktotracetumortissueoforiginbasedonrnasequencing
AT wangbo crosscohortcomputationalframeworktotracetumortissueoforiginbasedonrnasequencing