Cargando…

Detect tissue heterogeneity in gene expression data with BioQC

BACKGROUND: Gene expression data can be compromised by cells originating from other tissues than the target tissue of profiling. Failures in detecting such tissue heterogeneity have profound implications on data interpretation and reproducibility. A computational tool explicitly addressing the issue...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Jitao David, Hatje, Klas, Sturm, Gregor, Broger, Clemens, Ebeling, Martin, Burtin, Martine, Terzi, Fabiola, Pomposiello, Silvia Ines, Badi, Laura
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5379536/
https://www.ncbi.nlm.nih.gov/pubmed/28376718
http://dx.doi.org/10.1186/s12864-017-3661-2
_version_ 1782519626720083968
author Zhang, Jitao David
Hatje, Klas
Sturm, Gregor
Broger, Clemens
Ebeling, Martin
Burtin, Martine
Terzi, Fabiola
Pomposiello, Silvia Ines
Badi, Laura
author_facet Zhang, Jitao David
Hatje, Klas
Sturm, Gregor
Broger, Clemens
Ebeling, Martin
Burtin, Martine
Terzi, Fabiola
Pomposiello, Silvia Ines
Badi, Laura
author_sort Zhang, Jitao David
collection PubMed
description BACKGROUND: Gene expression data can be compromised by cells originating from other tissues than the target tissue of profiling. Failures in detecting such tissue heterogeneity have profound implications on data interpretation and reproducibility. A computational tool explicitly addressing the issue is warranted. RESULTS: We introduce BioQC, a R/Bioconductor software package to detect tissue heterogeneity in gene expression data. To this end BioQC implements a computationally efficient Wilcoxon-Mann-Whitney test and provides more than 150 signatures of tissue-enriched genes derived from large-scale transcriptomics studies. Simulation experiments show that BioQC is both fast and sensitive in detecting tissue heterogeneity. In a case study with whole-organ profiling data, BioQC predicted contamination events that are confirmed by quantitative RT-PCR. Applied to transcriptomics data of the Genotype-Tissue Expression (GTEx) project, BioQC reveals clustering of samples and suggests that some samples likely suffer from tissue heterogeneity. CONCLUSIONS: Our experience with gene expression data indicates a prevalence of tissue heterogeneity that often goes unnoticed. BioQC addresses the issue by integrating prior knowledge with a scalable algorithm. We propose BioQC as a first-line tool to ensure quality and reproducibility of gene expression data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3661-2) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5379536
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-53795362017-04-07 Detect tissue heterogeneity in gene expression data with BioQC Zhang, Jitao David Hatje, Klas Sturm, Gregor Broger, Clemens Ebeling, Martin Burtin, Martine Terzi, Fabiola Pomposiello, Silvia Ines Badi, Laura BMC Genomics Software BACKGROUND: Gene expression data can be compromised by cells originating from other tissues than the target tissue of profiling. Failures in detecting such tissue heterogeneity have profound implications on data interpretation and reproducibility. A computational tool explicitly addressing the issue is warranted. RESULTS: We introduce BioQC, a R/Bioconductor software package to detect tissue heterogeneity in gene expression data. To this end BioQC implements a computationally efficient Wilcoxon-Mann-Whitney test and provides more than 150 signatures of tissue-enriched genes derived from large-scale transcriptomics studies. Simulation experiments show that BioQC is both fast and sensitive in detecting tissue heterogeneity. In a case study with whole-organ profiling data, BioQC predicted contamination events that are confirmed by quantitative RT-PCR. Applied to transcriptomics data of the Genotype-Tissue Expression (GTEx) project, BioQC reveals clustering of samples and suggests that some samples likely suffer from tissue heterogeneity. CONCLUSIONS: Our experience with gene expression data indicates a prevalence of tissue heterogeneity that often goes unnoticed. BioQC addresses the issue by integrating prior knowledge with a scalable algorithm. We propose BioQC as a first-line tool to ensure quality and reproducibility of gene expression data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3661-2) contains supplementary material, which is available to authorized users. BioMed Central 2017-04-04 /pmc/articles/PMC5379536/ /pubmed/28376718 http://dx.doi.org/10.1186/s12864-017-3661-2 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Zhang, Jitao David
Hatje, Klas
Sturm, Gregor
Broger, Clemens
Ebeling, Martin
Burtin, Martine
Terzi, Fabiola
Pomposiello, Silvia Ines
Badi, Laura
Detect tissue heterogeneity in gene expression data with BioQC
title Detect tissue heterogeneity in gene expression data with BioQC
title_full Detect tissue heterogeneity in gene expression data with BioQC
title_fullStr Detect tissue heterogeneity in gene expression data with BioQC
title_full_unstemmed Detect tissue heterogeneity in gene expression data with BioQC
title_short Detect tissue heterogeneity in gene expression data with BioQC
title_sort detect tissue heterogeneity in gene expression data with bioqc
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5379536/
https://www.ncbi.nlm.nih.gov/pubmed/28376718
http://dx.doi.org/10.1186/s12864-017-3661-2
work_keys_str_mv AT zhangjitaodavid detecttissueheterogeneityingeneexpressiondatawithbioqc
AT hatjeklas detecttissueheterogeneityingeneexpressiondatawithbioqc
AT sturmgregor detecttissueheterogeneityingeneexpressiondatawithbioqc
AT brogerclemens detecttissueheterogeneityingeneexpressiondatawithbioqc
AT ebelingmartin detecttissueheterogeneityingeneexpressiondatawithbioqc
AT burtinmartine detecttissueheterogeneityingeneexpressiondatawithbioqc
AT terzifabiola detecttissueheterogeneityingeneexpressiondatawithbioqc
AT pomposiellosilviaines detecttissueheterogeneityingeneexpressiondatawithbioqc
AT badilaura detecttissueheterogeneityingeneexpressiondatawithbioqc