Cargando…
Tissue heterogeneity is prevalent in gene expression studies
Lack of reproducibility in gene expression studies is a serious issue being actively addressed by the biomedical research community. Besides established factors such as batch effects and incorrect sample annotations, we recently reported tissue heterogeneity, a consequence of unintended profiling of...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8415427/ https://www.ncbi.nlm.nih.gov/pubmed/34514392 http://dx.doi.org/10.1093/nargab/lqab077 |
_version_ | 1783747968166264832 |
---|---|
author | Sturm, Gregor List, Markus Zhang, Jitao David |
author_facet | Sturm, Gregor List, Markus Zhang, Jitao David |
author_sort | Sturm, Gregor |
collection | PubMed |
description | Lack of reproducibility in gene expression studies is a serious issue being actively addressed by the biomedical research community. Besides established factors such as batch effects and incorrect sample annotations, we recently reported tissue heterogeneity, a consequence of unintended profiling of cells of other origins than the tissue of interest, as a source of variance. Although tissue heterogeneity exacerbates irreproducibility, its prevalence in gene expression data remains unknown. Here, we systematically analyse 2 667 publicly available gene expression datasets covering 76 576 samples. Using two independent data compendia and a reproducible, open-source software pipeline, we find a prevalence of tissue heterogeneity in gene expression data that affects between 1 and 40% of the samples, depending on the tissue type. We discover both cases of severe heterogeneity, which may be caused by mistakes in annotation or sample handling, and cases of moderate heterogeneity, which are likely caused by tissue infiltration or sample contamination. Our analysis establishes tissue heterogeneity as a widespread phenomenon in publicly available gene expression datasets, which constitutes an important source of variance that should not be ignored. Consequently, we advocate the application of quality-control methods such as BioQC to detect tissue heterogeneity prior to mining or analysing gene expression data. |
format | Online Article Text |
id | pubmed-8415427 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-84154272021-09-09 Tissue heterogeneity is prevalent in gene expression studies Sturm, Gregor List, Markus Zhang, Jitao David NAR Genom Bioinform Opinion Article Lack of reproducibility in gene expression studies is a serious issue being actively addressed by the biomedical research community. Besides established factors such as batch effects and incorrect sample annotations, we recently reported tissue heterogeneity, a consequence of unintended profiling of cells of other origins than the tissue of interest, as a source of variance. Although tissue heterogeneity exacerbates irreproducibility, its prevalence in gene expression data remains unknown. Here, we systematically analyse 2 667 publicly available gene expression datasets covering 76 576 samples. Using two independent data compendia and a reproducible, open-source software pipeline, we find a prevalence of tissue heterogeneity in gene expression data that affects between 1 and 40% of the samples, depending on the tissue type. We discover both cases of severe heterogeneity, which may be caused by mistakes in annotation or sample handling, and cases of moderate heterogeneity, which are likely caused by tissue infiltration or sample contamination. Our analysis establishes tissue heterogeneity as a widespread phenomenon in publicly available gene expression datasets, which constitutes an important source of variance that should not be ignored. Consequently, we advocate the application of quality-control methods such as BioQC to detect tissue heterogeneity prior to mining or analysing gene expression data. Oxford University Press 2021-09-03 /pmc/articles/PMC8415427/ /pubmed/34514392 http://dx.doi.org/10.1093/nargab/lqab077 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Opinion Article Sturm, Gregor List, Markus Zhang, Jitao David Tissue heterogeneity is prevalent in gene expression studies |
title | Tissue heterogeneity is prevalent in gene expression studies |
title_full | Tissue heterogeneity is prevalent in gene expression studies |
title_fullStr | Tissue heterogeneity is prevalent in gene expression studies |
title_full_unstemmed | Tissue heterogeneity is prevalent in gene expression studies |
title_short | Tissue heterogeneity is prevalent in gene expression studies |
title_sort | tissue heterogeneity is prevalent in gene expression studies |
topic | Opinion Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8415427/ https://www.ncbi.nlm.nih.gov/pubmed/34514392 http://dx.doi.org/10.1093/nargab/lqab077 |
work_keys_str_mv | AT sturmgregor tissueheterogeneityisprevalentingeneexpressionstudies AT listmarkus tissueheterogeneityisprevalentingeneexpressionstudies AT zhangjitaodavid tissueheterogeneityisprevalentingeneexpressionstudies |