Cargando…

A full Bayesian hierarchical mixture model for the variance of gene differential expression

BACKGROUND: In many laboratory-based high throughput microarray experiments, there are very few replicates of gene expression levels. Thus, estimates of gene variances are inaccurate. Visual inspection of graphical summaries of these data usually reveals that heteroscedasticity is present, and the s...

Descripción completa

Detalles Bibliográficos
Autores principales: Manda, Samuel OM, Walls, Rebecca E, Gilthorpe, Mark S
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1876253/
https://www.ncbi.nlm.nih.gov/pubmed/17439644
http://dx.doi.org/10.1186/1471-2105-8-124
_version_ 1782133521878351872
author Manda, Samuel OM
Walls, Rebecca E
Gilthorpe, Mark S
author_facet Manda, Samuel OM
Walls, Rebecca E
Gilthorpe, Mark S
author_sort Manda, Samuel OM
collection PubMed
description BACKGROUND: In many laboratory-based high throughput microarray experiments, there are very few replicates of gene expression levels. Thus, estimates of gene variances are inaccurate. Visual inspection of graphical summaries of these data usually reveals that heteroscedasticity is present, and the standard approach to address this is to take a log(2 )transformation. In such circumstances, it is then common to assume that gene variability is constant when an analysis of these data is undertaken. However, this is perhaps too stringent an assumption. More careful inspection reveals that the simple log(2 )transformation does not remove the problem of heteroscedasticity. An alternative strategy is to assume independent gene-specific variances; although again this is problematic as variance estimates based on few replications are highly unstable. More meaningful and reliable comparisons of gene expression might be achieved, for different conditions or different tissue samples, where the test statistics are based on accurate estimates of gene variability; a crucial step in the identification of differentially expressed genes. RESULTS: We propose a Bayesian mixture model, which classifies genes according to similarity in their variance. The result is that genes in the same latent class share the similar variance, estimated from a larger number of replicates than purely those per gene, i.e. the total of all replicates of all genes in the same latent class. An example dataset, consisting of 9216 genes with four replicates per condition, resulted in four latent classes based on their similarity of the variance. CONCLUSION: The mixture variance model provides a realistic and flexible estimate for the variance of gene expression data under limited replicates. We believe that in using the latent class variances, estimated from a larger number of genes in each derived latent group, the p-values obtained are more robust than either using a constant gene or gene-specific variance estimate.
format Text
id pubmed-1876253
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-18762532007-05-22 A full Bayesian hierarchical mixture model for the variance of gene differential expression Manda, Samuel OM Walls, Rebecca E Gilthorpe, Mark S BMC Bioinformatics Methodology Article BACKGROUND: In many laboratory-based high throughput microarray experiments, there are very few replicates of gene expression levels. Thus, estimates of gene variances are inaccurate. Visual inspection of graphical summaries of these data usually reveals that heteroscedasticity is present, and the standard approach to address this is to take a log(2 )transformation. In such circumstances, it is then common to assume that gene variability is constant when an analysis of these data is undertaken. However, this is perhaps too stringent an assumption. More careful inspection reveals that the simple log(2 )transformation does not remove the problem of heteroscedasticity. An alternative strategy is to assume independent gene-specific variances; although again this is problematic as variance estimates based on few replications are highly unstable. More meaningful and reliable comparisons of gene expression might be achieved, for different conditions or different tissue samples, where the test statistics are based on accurate estimates of gene variability; a crucial step in the identification of differentially expressed genes. RESULTS: We propose a Bayesian mixture model, which classifies genes according to similarity in their variance. The result is that genes in the same latent class share the similar variance, estimated from a larger number of replicates than purely those per gene, i.e. the total of all replicates of all genes in the same latent class. An example dataset, consisting of 9216 genes with four replicates per condition, resulted in four latent classes based on their similarity of the variance. CONCLUSION: The mixture variance model provides a realistic and flexible estimate for the variance of gene expression data under limited replicates. We believe that in using the latent class variances, estimated from a larger number of genes in each derived latent group, the p-values obtained are more robust than either using a constant gene or gene-specific variance estimate. BioMed Central 2007-04-17 /pmc/articles/PMC1876253/ /pubmed/17439644 http://dx.doi.org/10.1186/1471-2105-8-124 Text en Copyright © 2007 Manda et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Manda, Samuel OM
Walls, Rebecca E
Gilthorpe, Mark S
A full Bayesian hierarchical mixture model for the variance of gene differential expression
title A full Bayesian hierarchical mixture model for the variance of gene differential expression
title_full A full Bayesian hierarchical mixture model for the variance of gene differential expression
title_fullStr A full Bayesian hierarchical mixture model for the variance of gene differential expression
title_full_unstemmed A full Bayesian hierarchical mixture model for the variance of gene differential expression
title_short A full Bayesian hierarchical mixture model for the variance of gene differential expression
title_sort full bayesian hierarchical mixture model for the variance of gene differential expression
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1876253/
https://www.ncbi.nlm.nih.gov/pubmed/17439644
http://dx.doi.org/10.1186/1471-2105-8-124
work_keys_str_mv AT mandasamuelom afullbayesianhierarchicalmixturemodelforthevarianceofgenedifferentialexpression
AT wallsrebeccae afullbayesianhierarchicalmixturemodelforthevarianceofgenedifferentialexpression
AT gilthorpemarks afullbayesianhierarchicalmixturemodelforthevarianceofgenedifferentialexpression
AT mandasamuelom fullbayesianhierarchicalmixturemodelforthevarianceofgenedifferentialexpression
AT wallsrebeccae fullbayesianhierarchicalmixturemodelforthevarianceofgenedifferentialexpression
AT gilthorpemarks fullbayesianhierarchicalmixturemodelforthevarianceofgenedifferentialexpression