Cargando…
High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis
BACKGROUND: RNA sequencing (RNA-Seq) has been widely applied in oncology for monitoring transcriptome changes. However, the emerging problem that high variation of gene expression levels caused by tumor heterogeneity may affect the reproducibility of differential expression (DE) results has rarely b...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7845028/ https://www.ncbi.nlm.nih.gov/pubmed/33509298 http://dx.doi.org/10.1186/s40246-021-00308-5 |
_version_ | 1783644473131008000 |
---|---|
author | Cui, Weitong Xue, Huaru Wei, Lei Jin, Jinghua Tian, Xuewen Wang, Qinglu |
author_facet | Cui, Weitong Xue, Huaru Wei, Lei Jin, Jinghua Tian, Xuewen Wang, Qinglu |
author_sort | Cui, Weitong |
collection | PubMed |
description | BACKGROUND: RNA sequencing (RNA-Seq) has been widely applied in oncology for monitoring transcriptome changes. However, the emerging problem that high variation of gene expression levels caused by tumor heterogeneity may affect the reproducibility of differential expression (DE) results has rarely been studied. Here, we investigated the reproducibility of DE results for any given number of biological replicates between 3 and 24 and explored why a great many differentially expressed genes (DEGs) were not reproducible. RESULTS: Our findings demonstrate that poor reproducibility of DE results exists not only for small sample sizes, but also for relatively large sample sizes. Quite a few of the DEGs detected are specific to the samples in use, rather than genuinely differentially expressed under different conditions. Poor reproducibility of DE results is mainly caused by high variation of gene expression levels for the same gene in different samples. Even though biological variation may account for much of the high variation of gene expression levels, the effect of outlier count data also needs to be treated seriously, as outlier data severely interfere with DE analysis. CONCLUSIONS: High heterogeneity exists not only in tumor tissue samples of each cancer type studied, but also in normal samples. High heterogeneity leads to poor reproducibility of DEGs, undermining generalization of differential expression results. Therefore, it is necessary to use large sample sizes (at least 10 if possible) in RNA-Seq experimental designs to reduce the impact of biological variability and DE results should be interpreted cautiously unless soundly validated. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s40246-021-00308-5. |
format | Online Article Text |
id | pubmed-7845028 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-78450282021-02-01 High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis Cui, Weitong Xue, Huaru Wei, Lei Jin, Jinghua Tian, Xuewen Wang, Qinglu Hum Genomics Primary Research BACKGROUND: RNA sequencing (RNA-Seq) has been widely applied in oncology for monitoring transcriptome changes. However, the emerging problem that high variation of gene expression levels caused by tumor heterogeneity may affect the reproducibility of differential expression (DE) results has rarely been studied. Here, we investigated the reproducibility of DE results for any given number of biological replicates between 3 and 24 and explored why a great many differentially expressed genes (DEGs) were not reproducible. RESULTS: Our findings demonstrate that poor reproducibility of DE results exists not only for small sample sizes, but also for relatively large sample sizes. Quite a few of the DEGs detected are specific to the samples in use, rather than genuinely differentially expressed under different conditions. Poor reproducibility of DE results is mainly caused by high variation of gene expression levels for the same gene in different samples. Even though biological variation may account for much of the high variation of gene expression levels, the effect of outlier count data also needs to be treated seriously, as outlier data severely interfere with DE analysis. CONCLUSIONS: High heterogeneity exists not only in tumor tissue samples of each cancer type studied, but also in normal samples. High heterogeneity leads to poor reproducibility of DEGs, undermining generalization of differential expression results. Therefore, it is necessary to use large sample sizes (at least 10 if possible) in RNA-Seq experimental designs to reduce the impact of biological variability and DE results should be interpreted cautiously unless soundly validated. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s40246-021-00308-5. BioMed Central 2021-01-28 /pmc/articles/PMC7845028/ /pubmed/33509298 http://dx.doi.org/10.1186/s40246-021-00308-5 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Primary Research Cui, Weitong Xue, Huaru Wei, Lei Jin, Jinghua Tian, Xuewen Wang, Qinglu High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis |
title | High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis |
title_full | High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis |
title_fullStr | High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis |
title_full_unstemmed | High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis |
title_short | High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis |
title_sort | high heterogeneity undermines generalization of differential expression results in rna-seq analysis |
topic | Primary Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7845028/ https://www.ncbi.nlm.nih.gov/pubmed/33509298 http://dx.doi.org/10.1186/s40246-021-00308-5 |
work_keys_str_mv | AT cuiweitong highheterogeneityunderminesgeneralizationofdifferentialexpressionresultsinrnaseqanalysis AT xuehuaru highheterogeneityunderminesgeneralizationofdifferentialexpressionresultsinrnaseqanalysis AT weilei highheterogeneityunderminesgeneralizationofdifferentialexpressionresultsinrnaseqanalysis AT jinjinghua highheterogeneityunderminesgeneralizationofdifferentialexpressionresultsinrnaseqanalysis AT tianxuewen highheterogeneityunderminesgeneralizationofdifferentialexpressionresultsinrnaseqanalysis AT wangqinglu highheterogeneityunderminesgeneralizationofdifferentialexpressionresultsinrnaseqanalysis |