Cargando…
Detect tissue heterogeneity in gene expression data with BioQC
BACKGROUND: Gene expression data can be compromised by cells originating from other tissues than the target tissue of profiling. Failures in detecting such tissue heterogeneity have profound implications on data interpretation and reproducibility. A computational tool explicitly addressing the issue...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5379536/ https://www.ncbi.nlm.nih.gov/pubmed/28376718 http://dx.doi.org/10.1186/s12864-017-3661-2 |
_version_ | 1782519626720083968 |
---|---|
author | Zhang, Jitao David Hatje, Klas Sturm, Gregor Broger, Clemens Ebeling, Martin Burtin, Martine Terzi, Fabiola Pomposiello, Silvia Ines Badi, Laura |
author_facet | Zhang, Jitao David Hatje, Klas Sturm, Gregor Broger, Clemens Ebeling, Martin Burtin, Martine Terzi, Fabiola Pomposiello, Silvia Ines Badi, Laura |
author_sort | Zhang, Jitao David |
collection | PubMed |
description | BACKGROUND: Gene expression data can be compromised by cells originating from other tissues than the target tissue of profiling. Failures in detecting such tissue heterogeneity have profound implications on data interpretation and reproducibility. A computational tool explicitly addressing the issue is warranted. RESULTS: We introduce BioQC, a R/Bioconductor software package to detect tissue heterogeneity in gene expression data. To this end BioQC implements a computationally efficient Wilcoxon-Mann-Whitney test and provides more than 150 signatures of tissue-enriched genes derived from large-scale transcriptomics studies. Simulation experiments show that BioQC is both fast and sensitive in detecting tissue heterogeneity. In a case study with whole-organ profiling data, BioQC predicted contamination events that are confirmed by quantitative RT-PCR. Applied to transcriptomics data of the Genotype-Tissue Expression (GTEx) project, BioQC reveals clustering of samples and suggests that some samples likely suffer from tissue heterogeneity. CONCLUSIONS: Our experience with gene expression data indicates a prevalence of tissue heterogeneity that often goes unnoticed. BioQC addresses the issue by integrating prior knowledge with a scalable algorithm. We propose BioQC as a first-line tool to ensure quality and reproducibility of gene expression data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3661-2) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5379536 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-53795362017-04-07 Detect tissue heterogeneity in gene expression data with BioQC Zhang, Jitao David Hatje, Klas Sturm, Gregor Broger, Clemens Ebeling, Martin Burtin, Martine Terzi, Fabiola Pomposiello, Silvia Ines Badi, Laura BMC Genomics Software BACKGROUND: Gene expression data can be compromised by cells originating from other tissues than the target tissue of profiling. Failures in detecting such tissue heterogeneity have profound implications on data interpretation and reproducibility. A computational tool explicitly addressing the issue is warranted. RESULTS: We introduce BioQC, a R/Bioconductor software package to detect tissue heterogeneity in gene expression data. To this end BioQC implements a computationally efficient Wilcoxon-Mann-Whitney test and provides more than 150 signatures of tissue-enriched genes derived from large-scale transcriptomics studies. Simulation experiments show that BioQC is both fast and sensitive in detecting tissue heterogeneity. In a case study with whole-organ profiling data, BioQC predicted contamination events that are confirmed by quantitative RT-PCR. Applied to transcriptomics data of the Genotype-Tissue Expression (GTEx) project, BioQC reveals clustering of samples and suggests that some samples likely suffer from tissue heterogeneity. CONCLUSIONS: Our experience with gene expression data indicates a prevalence of tissue heterogeneity that often goes unnoticed. BioQC addresses the issue by integrating prior knowledge with a scalable algorithm. We propose BioQC as a first-line tool to ensure quality and reproducibility of gene expression data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3661-2) contains supplementary material, which is available to authorized users. BioMed Central 2017-04-04 /pmc/articles/PMC5379536/ /pubmed/28376718 http://dx.doi.org/10.1186/s12864-017-3661-2 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Zhang, Jitao David Hatje, Klas Sturm, Gregor Broger, Clemens Ebeling, Martin Burtin, Martine Terzi, Fabiola Pomposiello, Silvia Ines Badi, Laura Detect tissue heterogeneity in gene expression data with BioQC |
title | Detect tissue heterogeneity in gene expression data with BioQC |
title_full | Detect tissue heterogeneity in gene expression data with BioQC |
title_fullStr | Detect tissue heterogeneity in gene expression data with BioQC |
title_full_unstemmed | Detect tissue heterogeneity in gene expression data with BioQC |
title_short | Detect tissue heterogeneity in gene expression data with BioQC |
title_sort | detect tissue heterogeneity in gene expression data with bioqc |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5379536/ https://www.ncbi.nlm.nih.gov/pubmed/28376718 http://dx.doi.org/10.1186/s12864-017-3661-2 |
work_keys_str_mv | AT zhangjitaodavid detecttissueheterogeneityingeneexpressiondatawithbioqc AT hatjeklas detecttissueheterogeneityingeneexpressiondatawithbioqc AT sturmgregor detecttissueheterogeneityingeneexpressiondatawithbioqc AT brogerclemens detecttissueheterogeneityingeneexpressiondatawithbioqc AT ebelingmartin detecttissueheterogeneityingeneexpressiondatawithbioqc AT burtinmartine detecttissueheterogeneityingeneexpressiondatawithbioqc AT terzifabiola detecttissueheterogeneityingeneexpressiondatawithbioqc AT pomposiellosilviaines detecttissueheterogeneityingeneexpressiondatawithbioqc AT badilaura detecttissueheterogeneityingeneexpressiondatawithbioqc |