Cargando…
The Analysis of Gene Expression Data Incorporating Tumor Purity Information
The tumor microenvironment is composed of tumor cells, stroma cells, immune cells, blood vessels, and other associated non-cancerous cells. Gene expression measurements on tumor samples are an average over cells in the microenvironment. However, research questions often seek answers about tumor cell...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8419469/ https://www.ncbi.nlm.nih.gov/pubmed/34497631 http://dx.doi.org/10.3389/fgene.2021.642759 |
_version_ | 1783748759915593728 |
---|---|
author | Ahn, Seungjun Grimes, Tyler Datta, Somnath |
author_facet | Ahn, Seungjun Grimes, Tyler Datta, Somnath |
author_sort | Ahn, Seungjun |
collection | PubMed |
description | The tumor microenvironment is composed of tumor cells, stroma cells, immune cells, blood vessels, and other associated non-cancerous cells. Gene expression measurements on tumor samples are an average over cells in the microenvironment. However, research questions often seek answers about tumor cells rather than the surrounding non-tumor tissue. Previous studies have suggested that the tumor purity (TP)—the proportion of tumor cells in a solid tumor sample—has a confounding effect on differential expression (DE) analysis of high vs. low survival groups. We investigate three ways incorporating the TP information in the two statistical methods used for analyzing gene expression data, namely, differential network (DN) analysis and DE analysis. Analysis 1 ignores the TP information completely, Analysis 2 uses a truncated sample by removing the low TP samples, and Analysis 3 uses TP as a covariate in the underlying statistical models. We use three gene expression data sets related to three different cancers from the Cancer Genome Atlas (TCGA) for our investigation. The networks from Analysis 2 have greater amount of differential connectivity in the two networks than that from Analysis 1 in all three cancer datasets. Similarly, Analysis 1 identified more differentially expressed genes than Analysis 2. Results of DN and DE analyses using Analysis 3 were mostly consistent with those of Analysis 1 across three cancers. However, Analysis 3 identified additional cancer-related genes in both DN and DE analyses. Our findings suggest that using TP as a covariate in a linear model is appropriate for DE analysis, but a more robust model is needed for DN analysis. However, because true DN or DE patterns are not known for the empirical datasets, simulated datasets can be used to study the statistical properties of these methods in future studies. |
format | Online Article Text |
id | pubmed-8419469 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-84194692021-09-07 The Analysis of Gene Expression Data Incorporating Tumor Purity Information Ahn, Seungjun Grimes, Tyler Datta, Somnath Front Genet Genetics The tumor microenvironment is composed of tumor cells, stroma cells, immune cells, blood vessels, and other associated non-cancerous cells. Gene expression measurements on tumor samples are an average over cells in the microenvironment. However, research questions often seek answers about tumor cells rather than the surrounding non-tumor tissue. Previous studies have suggested that the tumor purity (TP)—the proportion of tumor cells in a solid tumor sample—has a confounding effect on differential expression (DE) analysis of high vs. low survival groups. We investigate three ways incorporating the TP information in the two statistical methods used for analyzing gene expression data, namely, differential network (DN) analysis and DE analysis. Analysis 1 ignores the TP information completely, Analysis 2 uses a truncated sample by removing the low TP samples, and Analysis 3 uses TP as a covariate in the underlying statistical models. We use three gene expression data sets related to three different cancers from the Cancer Genome Atlas (TCGA) for our investigation. The networks from Analysis 2 have greater amount of differential connectivity in the two networks than that from Analysis 1 in all three cancer datasets. Similarly, Analysis 1 identified more differentially expressed genes than Analysis 2. Results of DN and DE analyses using Analysis 3 were mostly consistent with those of Analysis 1 across three cancers. However, Analysis 3 identified additional cancer-related genes in both DN and DE analyses. Our findings suggest that using TP as a covariate in a linear model is appropriate for DE analysis, but a more robust model is needed for DN analysis. However, because true DN or DE patterns are not known for the empirical datasets, simulated datasets can be used to study the statistical properties of these methods in future studies. Frontiers Media S.A. 2021-08-23 /pmc/articles/PMC8419469/ /pubmed/34497631 http://dx.doi.org/10.3389/fgene.2021.642759 Text en Copyright © 2021 Ahn, Grimes and Datta. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Ahn, Seungjun Grimes, Tyler Datta, Somnath The Analysis of Gene Expression Data Incorporating Tumor Purity Information |
title | The Analysis of Gene Expression Data Incorporating Tumor Purity Information |
title_full | The Analysis of Gene Expression Data Incorporating Tumor Purity Information |
title_fullStr | The Analysis of Gene Expression Data Incorporating Tumor Purity Information |
title_full_unstemmed | The Analysis of Gene Expression Data Incorporating Tumor Purity Information |
title_short | The Analysis of Gene Expression Data Incorporating Tumor Purity Information |
title_sort | analysis of gene expression data incorporating tumor purity information |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8419469/ https://www.ncbi.nlm.nih.gov/pubmed/34497631 http://dx.doi.org/10.3389/fgene.2021.642759 |
work_keys_str_mv | AT ahnseungjun theanalysisofgeneexpressiondataincorporatingtumorpurityinformation AT grimestyler theanalysisofgeneexpressiondataincorporatingtumorpurityinformation AT dattasomnath theanalysisofgeneexpressiondataincorporatingtumorpurityinformation AT ahnseungjun analysisofgeneexpressiondataincorporatingtumorpurityinformation AT grimestyler analysisofgeneexpressiondataincorporatingtumorpurityinformation AT dattasomnath analysisofgeneexpressiondataincorporatingtumorpurityinformation |