Cargando…

High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis

BACKGROUND: RNA sequencing (RNA-Seq) has been widely applied in oncology for monitoring transcriptome changes. However, the emerging problem that high variation of gene expression levels caused by tumor heterogeneity may affect the reproducibility of differential expression (DE) results has rarely b...

Descripción completa

Detalles Bibliográficos
Autores principales: Cui, Weitong, Xue, Huaru, Wei, Lei, Jin, Jinghua, Tian, Xuewen, Wang, Qinglu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7845028/
https://www.ncbi.nlm.nih.gov/pubmed/33509298
http://dx.doi.org/10.1186/s40246-021-00308-5
_version_ 1783644473131008000
author Cui, Weitong
Xue, Huaru
Wei, Lei
Jin, Jinghua
Tian, Xuewen
Wang, Qinglu
author_facet Cui, Weitong
Xue, Huaru
Wei, Lei
Jin, Jinghua
Tian, Xuewen
Wang, Qinglu
author_sort Cui, Weitong
collection PubMed
description BACKGROUND: RNA sequencing (RNA-Seq) has been widely applied in oncology for monitoring transcriptome changes. However, the emerging problem that high variation of gene expression levels caused by tumor heterogeneity may affect the reproducibility of differential expression (DE) results has rarely been studied. Here, we investigated the reproducibility of DE results for any given number of biological replicates between 3 and 24 and explored why a great many differentially expressed genes (DEGs) were not reproducible. RESULTS: Our findings demonstrate that poor reproducibility of DE results exists not only for small sample sizes, but also for relatively large sample sizes. Quite a few of the DEGs detected are specific to the samples in use, rather than genuinely differentially expressed under different conditions. Poor reproducibility of DE results is mainly caused by high variation of gene expression levels for the same gene in different samples. Even though biological variation may account for much of the high variation of gene expression levels, the effect of outlier count data also needs to be treated seriously, as outlier data severely interfere with DE analysis. CONCLUSIONS: High heterogeneity exists not only in tumor tissue samples of each cancer type studied, but also in normal samples. High heterogeneity leads to poor reproducibility of DEGs, undermining generalization of differential expression results. Therefore, it is necessary to use large sample sizes (at least 10 if possible) in RNA-Seq experimental designs to reduce the impact of biological variability and DE results should be interpreted cautiously unless soundly validated. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s40246-021-00308-5.
format Online
Article
Text
id pubmed-7845028
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-78450282021-02-01 High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis Cui, Weitong Xue, Huaru Wei, Lei Jin, Jinghua Tian, Xuewen Wang, Qinglu Hum Genomics Primary Research BACKGROUND: RNA sequencing (RNA-Seq) has been widely applied in oncology for monitoring transcriptome changes. However, the emerging problem that high variation of gene expression levels caused by tumor heterogeneity may affect the reproducibility of differential expression (DE) results has rarely been studied. Here, we investigated the reproducibility of DE results for any given number of biological replicates between 3 and 24 and explored why a great many differentially expressed genes (DEGs) were not reproducible. RESULTS: Our findings demonstrate that poor reproducibility of DE results exists not only for small sample sizes, but also for relatively large sample sizes. Quite a few of the DEGs detected are specific to the samples in use, rather than genuinely differentially expressed under different conditions. Poor reproducibility of DE results is mainly caused by high variation of gene expression levels for the same gene in different samples. Even though biological variation may account for much of the high variation of gene expression levels, the effect of outlier count data also needs to be treated seriously, as outlier data severely interfere with DE analysis. CONCLUSIONS: High heterogeneity exists not only in tumor tissue samples of each cancer type studied, but also in normal samples. High heterogeneity leads to poor reproducibility of DEGs, undermining generalization of differential expression results. Therefore, it is necessary to use large sample sizes (at least 10 if possible) in RNA-Seq experimental designs to reduce the impact of biological variability and DE results should be interpreted cautiously unless soundly validated. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s40246-021-00308-5. BioMed Central 2021-01-28 /pmc/articles/PMC7845028/ /pubmed/33509298 http://dx.doi.org/10.1186/s40246-021-00308-5 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Primary Research
Cui, Weitong
Xue, Huaru
Wei, Lei
Jin, Jinghua
Tian, Xuewen
Wang, Qinglu
High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis
title High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis
title_full High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis
title_fullStr High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis
title_full_unstemmed High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis
title_short High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis
title_sort high heterogeneity undermines generalization of differential expression results in rna-seq analysis
topic Primary Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7845028/
https://www.ncbi.nlm.nih.gov/pubmed/33509298
http://dx.doi.org/10.1186/s40246-021-00308-5
work_keys_str_mv AT cuiweitong highheterogeneityunderminesgeneralizationofdifferentialexpressionresultsinrnaseqanalysis
AT xuehuaru highheterogeneityunderminesgeneralizationofdifferentialexpressionresultsinrnaseqanalysis
AT weilei highheterogeneityunderminesgeneralizationofdifferentialexpressionresultsinrnaseqanalysis
AT jinjinghua highheterogeneityunderminesgeneralizationofdifferentialexpressionresultsinrnaseqanalysis
AT tianxuewen highheterogeneityunderminesgeneralizationofdifferentialexpressionresultsinrnaseqanalysis
AT wangqinglu highheterogeneityunderminesgeneralizationofdifferentialexpressionresultsinrnaseqanalysis