Cargando…

Tissue heterogeneity is prevalent in gene expression studies

Lack of reproducibility in gene expression studies is a serious issue being actively addressed by the biomedical research community. Besides established factors such as batch effects and incorrect sample annotations, we recently reported tissue heterogeneity, a consequence of unintended profiling of...

Descripción completa

Detalles Bibliográficos
Autores principales: Sturm, Gregor, List, Markus, Zhang, Jitao David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8415427/
https://www.ncbi.nlm.nih.gov/pubmed/34514392
http://dx.doi.org/10.1093/nargab/lqab077
_version_ 1783747968166264832
author Sturm, Gregor
List, Markus
Zhang, Jitao David
author_facet Sturm, Gregor
List, Markus
Zhang, Jitao David
author_sort Sturm, Gregor
collection PubMed
description Lack of reproducibility in gene expression studies is a serious issue being actively addressed by the biomedical research community. Besides established factors such as batch effects and incorrect sample annotations, we recently reported tissue heterogeneity, a consequence of unintended profiling of cells of other origins than the tissue of interest, as a source of variance. Although tissue heterogeneity exacerbates irreproducibility, its prevalence in gene expression data remains unknown. Here, we systematically analyse 2 667 publicly available gene expression datasets covering 76 576 samples. Using two independent data compendia and a reproducible, open-source software pipeline, we find a prevalence of tissue heterogeneity in gene expression data that affects between 1 and 40% of the samples, depending on the tissue type. We discover both cases of severe heterogeneity, which may be caused by mistakes in annotation or sample handling, and cases of moderate heterogeneity, which are likely caused by tissue infiltration or sample contamination. Our analysis establishes tissue heterogeneity as a widespread phenomenon in publicly available gene expression datasets, which constitutes an important source of variance that should not be ignored. Consequently, we advocate the application of quality-control methods such as BioQC to detect tissue heterogeneity prior to mining or analysing gene expression data.
format Online
Article
Text
id pubmed-8415427
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-84154272021-09-09 Tissue heterogeneity is prevalent in gene expression studies Sturm, Gregor List, Markus Zhang, Jitao David NAR Genom Bioinform Opinion Article Lack of reproducibility in gene expression studies is a serious issue being actively addressed by the biomedical research community. Besides established factors such as batch effects and incorrect sample annotations, we recently reported tissue heterogeneity, a consequence of unintended profiling of cells of other origins than the tissue of interest, as a source of variance. Although tissue heterogeneity exacerbates irreproducibility, its prevalence in gene expression data remains unknown. Here, we systematically analyse 2 667 publicly available gene expression datasets covering 76 576 samples. Using two independent data compendia and a reproducible, open-source software pipeline, we find a prevalence of tissue heterogeneity in gene expression data that affects between 1 and 40% of the samples, depending on the tissue type. We discover both cases of severe heterogeneity, which may be caused by mistakes in annotation or sample handling, and cases of moderate heterogeneity, which are likely caused by tissue infiltration or sample contamination. Our analysis establishes tissue heterogeneity as a widespread phenomenon in publicly available gene expression datasets, which constitutes an important source of variance that should not be ignored. Consequently, we advocate the application of quality-control methods such as BioQC to detect tissue heterogeneity prior to mining or analysing gene expression data. Oxford University Press 2021-09-03 /pmc/articles/PMC8415427/ /pubmed/34514392 http://dx.doi.org/10.1093/nargab/lqab077 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Opinion Article
Sturm, Gregor
List, Markus
Zhang, Jitao David
Tissue heterogeneity is prevalent in gene expression studies
title Tissue heterogeneity is prevalent in gene expression studies
title_full Tissue heterogeneity is prevalent in gene expression studies
title_fullStr Tissue heterogeneity is prevalent in gene expression studies
title_full_unstemmed Tissue heterogeneity is prevalent in gene expression studies
title_short Tissue heterogeneity is prevalent in gene expression studies
title_sort tissue heterogeneity is prevalent in gene expression studies
topic Opinion Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8415427/
https://www.ncbi.nlm.nih.gov/pubmed/34514392
http://dx.doi.org/10.1093/nargab/lqab077
work_keys_str_mv AT sturmgregor tissueheterogeneityisprevalentingeneexpressionstudies
AT listmarkus tissueheterogeneityisprevalentingeneexpressionstudies
AT zhangjitaodavid tissueheterogeneityisprevalentingeneexpressionstudies