Cargando…

Identification of Tumor Tissue of Origin with RNA-Seq Data and Using Gradient Boosting Strategy

BACKGROUND: Cancer of unknown primary (CUP) is a type of malignant tumor, which is histologically diagnosed as a metastatic carcinoma while the tissue-of-origin cannot be identified. CUP accounts for roughly 5% of all cancers. Traditional treatment for CUP is primarily broad-spectrum chemotherapy; h...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Ruixi, Liao, Bo, Wang, Bo, Dai, Chan, Liang, Xin, Tian, Geng, Wu, Fangxiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7904362/
https://www.ncbi.nlm.nih.gov/pubmed/33681364
http://dx.doi.org/10.1155/2021/6653793
_version_ 1783654912490471424
author Li, Ruixi
Liao, Bo
Wang, Bo
Dai, Chan
Liang, Xin
Tian, Geng
Wu, Fangxiang
author_facet Li, Ruixi
Liao, Bo
Wang, Bo
Dai, Chan
Liang, Xin
Tian, Geng
Wu, Fangxiang
author_sort Li, Ruixi
collection PubMed
description BACKGROUND: Cancer of unknown primary (CUP) is a type of malignant tumor, which is histologically diagnosed as a metastatic carcinoma while the tissue-of-origin cannot be identified. CUP accounts for roughly 5% of all cancers. Traditional treatment for CUP is primarily broad-spectrum chemotherapy; however, the prognosis is relatively poor. Thus, it is of clinical importance to accurately infer the tissue-of-origin of CUP. METHODS: We developed a gradient boosting framework to trace tissue-of-origin of 20 types of solid tumors. Specifically, we downloaded the expression profiles of 20,501 genes for 7713 samples from The Cancer Genome Atlas (TCGA), which were used as the training data set. The RNA-seq data of 79 tumor samples from 6 cancer types with known origins were also downloaded from the Gene Expression Omnibus (GEO) for an independent data set. RESULTS: 400 genes were selected to train a gradient boosting model for identification of the primary site of the tumor. The overall 10-fold cross-validation accuracy of our method was 96.1% across 20 types of cancer, while the accuracy for the independent data set reached 83.5%. CONCLUSION: Our gradient boosting framework was proven to be accurate in identifying tumor tissue-of-origin on both training data and independent testing data, which might be of practical usage.
format Online
Article
Text
id pubmed-7904362
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-79043622021-03-04 Identification of Tumor Tissue of Origin with RNA-Seq Data and Using Gradient Boosting Strategy Li, Ruixi Liao, Bo Wang, Bo Dai, Chan Liang, Xin Tian, Geng Wu, Fangxiang Biomed Res Int Research Article BACKGROUND: Cancer of unknown primary (CUP) is a type of malignant tumor, which is histologically diagnosed as a metastatic carcinoma while the tissue-of-origin cannot be identified. CUP accounts for roughly 5% of all cancers. Traditional treatment for CUP is primarily broad-spectrum chemotherapy; however, the prognosis is relatively poor. Thus, it is of clinical importance to accurately infer the tissue-of-origin of CUP. METHODS: We developed a gradient boosting framework to trace tissue-of-origin of 20 types of solid tumors. Specifically, we downloaded the expression profiles of 20,501 genes for 7713 samples from The Cancer Genome Atlas (TCGA), which were used as the training data set. The RNA-seq data of 79 tumor samples from 6 cancer types with known origins were also downloaded from the Gene Expression Omnibus (GEO) for an independent data set. RESULTS: 400 genes were selected to train a gradient boosting model for identification of the primary site of the tumor. The overall 10-fold cross-validation accuracy of our method was 96.1% across 20 types of cancer, while the accuracy for the independent data set reached 83.5%. CONCLUSION: Our gradient boosting framework was proven to be accurate in identifying tumor tissue-of-origin on both training data and independent testing data, which might be of practical usage. Hindawi 2021-02-17 /pmc/articles/PMC7904362/ /pubmed/33681364 http://dx.doi.org/10.1155/2021/6653793 Text en Copyright © 2021 Ruixi Li et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Li, Ruixi
Liao, Bo
Wang, Bo
Dai, Chan
Liang, Xin
Tian, Geng
Wu, Fangxiang
Identification of Tumor Tissue of Origin with RNA-Seq Data and Using Gradient Boosting Strategy
title Identification of Tumor Tissue of Origin with RNA-Seq Data and Using Gradient Boosting Strategy
title_full Identification of Tumor Tissue of Origin with RNA-Seq Data and Using Gradient Boosting Strategy
title_fullStr Identification of Tumor Tissue of Origin with RNA-Seq Data and Using Gradient Boosting Strategy
title_full_unstemmed Identification of Tumor Tissue of Origin with RNA-Seq Data and Using Gradient Boosting Strategy
title_short Identification of Tumor Tissue of Origin with RNA-Seq Data and Using Gradient Boosting Strategy
title_sort identification of tumor tissue of origin with rna-seq data and using gradient boosting strategy
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7904362/
https://www.ncbi.nlm.nih.gov/pubmed/33681364
http://dx.doi.org/10.1155/2021/6653793
work_keys_str_mv AT liruixi identificationoftumortissueoforiginwithrnaseqdataandusinggradientboostingstrategy
AT liaobo identificationoftumortissueoforiginwithrnaseqdataandusinggradientboostingstrategy
AT wangbo identificationoftumortissueoforiginwithrnaseqdataandusinggradientboostingstrategy
AT daichan identificationoftumortissueoforiginwithrnaseqdataandusinggradientboostingstrategy
AT liangxin identificationoftumortissueoforiginwithrnaseqdataandusinggradientboostingstrategy
AT tiangeng identificationoftumortissueoforiginwithrnaseqdataandusinggradientboostingstrategy
AT wufangxiang identificationoftumortissueoforiginwithrnaseqdataandusinggradientboostingstrategy