Cargando…

Putative biomarkers for predicting tumor sample purity based on gene expression data

BACKGROUND: Tumor purity is the percent of cancer cells present in a sample of tumor tissue. The non-cancerous cells (immune cells, fibroblasts, etc.) have an important role in tumor biology. The ability to determine tumor purity is important to understand the roles of cancerous and non-cancerous ce...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Yuanyuan, Umbach, David M., Bingham, Adrienna, Li, Qi-Jing, Zhuang, Yuan, Li, Leping
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6933652/
https://www.ncbi.nlm.nih.gov/pubmed/31881847
http://dx.doi.org/10.1186/s12864-019-6412-8
_version_ 1783483250402918400
author Li, Yuanyuan
Umbach, David M.
Bingham, Adrienna
Li, Qi-Jing
Zhuang, Yuan
Li, Leping
author_facet Li, Yuanyuan
Umbach, David M.
Bingham, Adrienna
Li, Qi-Jing
Zhuang, Yuan
Li, Leping
author_sort Li, Yuanyuan
collection PubMed
description BACKGROUND: Tumor purity is the percent of cancer cells present in a sample of tumor tissue. The non-cancerous cells (immune cells, fibroblasts, etc.) have an important role in tumor biology. The ability to determine tumor purity is important to understand the roles of cancerous and non-cancerous cells in a tumor. METHODS: We applied a supervised machine learning method, XGBoost, to data from 33 TCGA tumor types to predict tumor purity using RNA-seq gene expression data. RESULTS: Across the 33 tumor types, the median correlation between observed and predicted tumor-purity ranged from 0.75 to 0.87 with small root mean square errors, suggesting that tumor purity can be accurately predicted υσινγ expression data. We further confirmed that expression levels of a ten-gene set (CSF2RB, RHOH, C1S, CCDC69, CCL22, CYTIP, POU2AF1, FGR, CCL21, and IL7R) were predictive of tumor purity regardless of tumor type. We tested whether our set of ten genes could accurately predict tumor purity of a TCGA-independent data set. We showed that expression levels from our set of ten genes were highly correlated (ρ = 0.88) with the actual observed tumor purity. CONCLUSIONS: Our analyses suggested that the ten-gene set may serve as a biomarker for tumor purity prediction using gene expression data.
format Online
Article
Text
id pubmed-6933652
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69336522019-12-30 Putative biomarkers for predicting tumor sample purity based on gene expression data Li, Yuanyuan Umbach, David M. Bingham, Adrienna Li, Qi-Jing Zhuang, Yuan Li, Leping BMC Genomics Research Article BACKGROUND: Tumor purity is the percent of cancer cells present in a sample of tumor tissue. The non-cancerous cells (immune cells, fibroblasts, etc.) have an important role in tumor biology. The ability to determine tumor purity is important to understand the roles of cancerous and non-cancerous cells in a tumor. METHODS: We applied a supervised machine learning method, XGBoost, to data from 33 TCGA tumor types to predict tumor purity using RNA-seq gene expression data. RESULTS: Across the 33 tumor types, the median correlation between observed and predicted tumor-purity ranged from 0.75 to 0.87 with small root mean square errors, suggesting that tumor purity can be accurately predicted υσινγ expression data. We further confirmed that expression levels of a ten-gene set (CSF2RB, RHOH, C1S, CCDC69, CCL22, CYTIP, POU2AF1, FGR, CCL21, and IL7R) were predictive of tumor purity regardless of tumor type. We tested whether our set of ten genes could accurately predict tumor purity of a TCGA-independent data set. We showed that expression levels from our set of ten genes were highly correlated (ρ = 0.88) with the actual observed tumor purity. CONCLUSIONS: Our analyses suggested that the ten-gene set may serve as a biomarker for tumor purity prediction using gene expression data. BioMed Central 2019-12-27 /pmc/articles/PMC6933652/ /pubmed/31881847 http://dx.doi.org/10.1186/s12864-019-6412-8 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Li, Yuanyuan
Umbach, David M.
Bingham, Adrienna
Li, Qi-Jing
Zhuang, Yuan
Li, Leping
Putative biomarkers for predicting tumor sample purity based on gene expression data
title Putative biomarkers for predicting tumor sample purity based on gene expression data
title_full Putative biomarkers for predicting tumor sample purity based on gene expression data
title_fullStr Putative biomarkers for predicting tumor sample purity based on gene expression data
title_full_unstemmed Putative biomarkers for predicting tumor sample purity based on gene expression data
title_short Putative biomarkers for predicting tumor sample purity based on gene expression data
title_sort putative biomarkers for predicting tumor sample purity based on gene expression data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6933652/
https://www.ncbi.nlm.nih.gov/pubmed/31881847
http://dx.doi.org/10.1186/s12864-019-6412-8
work_keys_str_mv AT liyuanyuan putativebiomarkersforpredictingtumorsamplepuritybasedongeneexpressiondata
AT umbachdavidm putativebiomarkersforpredictingtumorsamplepuritybasedongeneexpressiondata
AT binghamadrienna putativebiomarkersforpredictingtumorsamplepuritybasedongeneexpressiondata
AT liqijing putativebiomarkersforpredictingtumorsamplepuritybasedongeneexpressiondata
AT zhuangyuan putativebiomarkersforpredictingtumorsamplepuritybasedongeneexpressiondata
AT lileping putativebiomarkersforpredictingtumorsamplepuritybasedongeneexpressiondata