Cargando…

The Analysis of Gene Expression Data Incorporating Tumor Purity Information

The tumor microenvironment is composed of tumor cells, stroma cells, immune cells, blood vessels, and other associated non-cancerous cells. Gene expression measurements on tumor samples are an average over cells in the microenvironment. However, research questions often seek answers about tumor cell...

Descripción completa

Detalles Bibliográficos
Autores principales: Ahn, Seungjun, Grimes, Tyler, Datta, Somnath
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8419469/
https://www.ncbi.nlm.nih.gov/pubmed/34497631
http://dx.doi.org/10.3389/fgene.2021.642759
_version_ 1783748759915593728
author Ahn, Seungjun
Grimes, Tyler
Datta, Somnath
author_facet Ahn, Seungjun
Grimes, Tyler
Datta, Somnath
author_sort Ahn, Seungjun
collection PubMed
description The tumor microenvironment is composed of tumor cells, stroma cells, immune cells, blood vessels, and other associated non-cancerous cells. Gene expression measurements on tumor samples are an average over cells in the microenvironment. However, research questions often seek answers about tumor cells rather than the surrounding non-tumor tissue. Previous studies have suggested that the tumor purity (TP)—the proportion of tumor cells in a solid tumor sample—has a confounding effect on differential expression (DE) analysis of high vs. low survival groups. We investigate three ways incorporating the TP information in the two statistical methods used for analyzing gene expression data, namely, differential network (DN) analysis and DE analysis. Analysis 1 ignores the TP information completely, Analysis 2 uses a truncated sample by removing the low TP samples, and Analysis 3 uses TP as a covariate in the underlying statistical models. We use three gene expression data sets related to three different cancers from the Cancer Genome Atlas (TCGA) for our investigation. The networks from Analysis 2 have greater amount of differential connectivity in the two networks than that from Analysis 1 in all three cancer datasets. Similarly, Analysis 1 identified more differentially expressed genes than Analysis 2. Results of DN and DE analyses using Analysis 3 were mostly consistent with those of Analysis 1 across three cancers. However, Analysis 3 identified additional cancer-related genes in both DN and DE analyses. Our findings suggest that using TP as a covariate in a linear model is appropriate for DE analysis, but a more robust model is needed for DN analysis. However, because true DN or DE patterns are not known for the empirical datasets, simulated datasets can be used to study the statistical properties of these methods in future studies.
format Online
Article
Text
id pubmed-8419469
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-84194692021-09-07 The Analysis of Gene Expression Data Incorporating Tumor Purity Information Ahn, Seungjun Grimes, Tyler Datta, Somnath Front Genet Genetics The tumor microenvironment is composed of tumor cells, stroma cells, immune cells, blood vessels, and other associated non-cancerous cells. Gene expression measurements on tumor samples are an average over cells in the microenvironment. However, research questions often seek answers about tumor cells rather than the surrounding non-tumor tissue. Previous studies have suggested that the tumor purity (TP)—the proportion of tumor cells in a solid tumor sample—has a confounding effect on differential expression (DE) analysis of high vs. low survival groups. We investigate three ways incorporating the TP information in the two statistical methods used for analyzing gene expression data, namely, differential network (DN) analysis and DE analysis. Analysis 1 ignores the TP information completely, Analysis 2 uses a truncated sample by removing the low TP samples, and Analysis 3 uses TP as a covariate in the underlying statistical models. We use three gene expression data sets related to three different cancers from the Cancer Genome Atlas (TCGA) for our investigation. The networks from Analysis 2 have greater amount of differential connectivity in the two networks than that from Analysis 1 in all three cancer datasets. Similarly, Analysis 1 identified more differentially expressed genes than Analysis 2. Results of DN and DE analyses using Analysis 3 were mostly consistent with those of Analysis 1 across three cancers. However, Analysis 3 identified additional cancer-related genes in both DN and DE analyses. Our findings suggest that using TP as a covariate in a linear model is appropriate for DE analysis, but a more robust model is needed for DN analysis. However, because true DN or DE patterns are not known for the empirical datasets, simulated datasets can be used to study the statistical properties of these methods in future studies. Frontiers Media S.A. 2021-08-23 /pmc/articles/PMC8419469/ /pubmed/34497631 http://dx.doi.org/10.3389/fgene.2021.642759 Text en Copyright © 2021 Ahn, Grimes and Datta. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Ahn, Seungjun
Grimes, Tyler
Datta, Somnath
The Analysis of Gene Expression Data Incorporating Tumor Purity Information
title The Analysis of Gene Expression Data Incorporating Tumor Purity Information
title_full The Analysis of Gene Expression Data Incorporating Tumor Purity Information
title_fullStr The Analysis of Gene Expression Data Incorporating Tumor Purity Information
title_full_unstemmed The Analysis of Gene Expression Data Incorporating Tumor Purity Information
title_short The Analysis of Gene Expression Data Incorporating Tumor Purity Information
title_sort analysis of gene expression data incorporating tumor purity information
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8419469/
https://www.ncbi.nlm.nih.gov/pubmed/34497631
http://dx.doi.org/10.3389/fgene.2021.642759
work_keys_str_mv AT ahnseungjun theanalysisofgeneexpressiondataincorporatingtumorpurityinformation
AT grimestyler theanalysisofgeneexpressiondataincorporatingtumorpurityinformation
AT dattasomnath theanalysisofgeneexpressiondataincorporatingtumorpurityinformation
AT ahnseungjun analysisofgeneexpressiondataincorporatingtumorpurityinformation
AT grimestyler analysisofgeneexpressiondataincorporatingtumorpurityinformation
AT dattasomnath analysisofgeneexpressiondataincorporatingtumorpurityinformation