Cargando…

β-empirical Bayes inference and model diagnosis of microarray data

BACKGROUND: Microarray data enables the high-throughput survey of mRNA expression profiles at the genomic level; however, the data presents a challenging statistical problem because of the large number of transcripts with small sample sizes that are obtained. To reduce the dimensionality, various Ba...

Descripción completa

Detalles Bibliográficos
Autores principales: Hossain Mollah, Mohammad Manir, Haque Mollah, M Nurul, Kishino, Hirohisa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3464654/
https://www.ncbi.nlm.nih.gov/pubmed/22713095
http://dx.doi.org/10.1186/1471-2105-13-135
_version_ 1782245445919047680
author Hossain Mollah, Mohammad Manir
Haque Mollah, M Nurul
Kishino, Hirohisa
author_facet Hossain Mollah, Mohammad Manir
Haque Mollah, M Nurul
Kishino, Hirohisa
author_sort Hossain Mollah, Mohammad Manir
collection PubMed
description BACKGROUND: Microarray data enables the high-throughput survey of mRNA expression profiles at the genomic level; however, the data presents a challenging statistical problem because of the large number of transcripts with small sample sizes that are obtained. To reduce the dimensionality, various Bayesian or empirical Bayes hierarchical models have been developed. However, because of the complexity of the microarray data, no model can explain the data fully. It is generally difficult to scrutinize the irregular patterns of expression that are not expected by the usual statistical gene by gene models. RESULTS: As an extension of empirical Bayes (EB) procedures, we have developed the β-empirical Bayes (β-EB) approach based on a β-likelihood measure which can be regarded as an ’evidence-based’ weighted (quasi-) likelihood inference. The weight of a transcript t is described as a power function of its likelihood, f(β)(y(t)|θ). Genes with low likelihoods have unexpected expression patterns and low weights. By assigning low weights to outliers, the inference becomes robust. The value of β, which controls the balance between the robustness and efficiency, is selected by maximizing the predictive β(0)-likelihood by cross-validation. The proposed β-EB approach identified six significant (p<10(−5)) contaminated transcripts as differentially expressed (DE) in normal/tumor tissues from the head and neck of cancer patients. These six genes were all confirmed to be related to cancer; they were not identified as DE genes by the classical EB approach. When applied to the eQTL analysis of Arabidopsis thaliana, the proposed β-EB approach identified some potential master regulators that were missed by the EB approach. CONCLUSIONS: The simulation data and real gene expression data showed that the proposed β-EB method was robust against outliers. The distribution of the weights was used to scrutinize the irregular patterns of expression and diagnose the model statistically. When β-weights outside the range of the predicted distribution were observed, a detailed inspection of the data was carried out. The β-weights described here can be applied to other likelihood-based statistical models for diagnosis, and may serve as a useful tool for transcriptome and proteome studies.
format Online
Article
Text
id pubmed-3464654
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34646542012-10-05 β-empirical Bayes inference and model diagnosis of microarray data Hossain Mollah, Mohammad Manir Haque Mollah, M Nurul Kishino, Hirohisa BMC Bioinformatics Methodology Article BACKGROUND: Microarray data enables the high-throughput survey of mRNA expression profiles at the genomic level; however, the data presents a challenging statistical problem because of the large number of transcripts with small sample sizes that are obtained. To reduce the dimensionality, various Bayesian or empirical Bayes hierarchical models have been developed. However, because of the complexity of the microarray data, no model can explain the data fully. It is generally difficult to scrutinize the irregular patterns of expression that are not expected by the usual statistical gene by gene models. RESULTS: As an extension of empirical Bayes (EB) procedures, we have developed the β-empirical Bayes (β-EB) approach based on a β-likelihood measure which can be regarded as an ’evidence-based’ weighted (quasi-) likelihood inference. The weight of a transcript t is described as a power function of its likelihood, f(β)(y(t)|θ). Genes with low likelihoods have unexpected expression patterns and low weights. By assigning low weights to outliers, the inference becomes robust. The value of β, which controls the balance between the robustness and efficiency, is selected by maximizing the predictive β(0)-likelihood by cross-validation. The proposed β-EB approach identified six significant (p<10(−5)) contaminated transcripts as differentially expressed (DE) in normal/tumor tissues from the head and neck of cancer patients. These six genes were all confirmed to be related to cancer; they were not identified as DE genes by the classical EB approach. When applied to the eQTL analysis of Arabidopsis thaliana, the proposed β-EB approach identified some potential master regulators that were missed by the EB approach. CONCLUSIONS: The simulation data and real gene expression data showed that the proposed β-EB method was robust against outliers. The distribution of the weights was used to scrutinize the irregular patterns of expression and diagnose the model statistically. When β-weights outside the range of the predicted distribution were observed, a detailed inspection of the data was carried out. The β-weights described here can be applied to other likelihood-based statistical models for diagnosis, and may serve as a useful tool for transcriptome and proteome studies. BioMed Central 2012-06-19 /pmc/articles/PMC3464654/ /pubmed/22713095 http://dx.doi.org/10.1186/1471-2105-13-135 Text en Copyright ©2012 Mollah et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Hossain Mollah, Mohammad Manir
Haque Mollah, M Nurul
Kishino, Hirohisa
β-empirical Bayes inference and model diagnosis of microarray data
title β-empirical Bayes inference and model diagnosis of microarray data
title_full β-empirical Bayes inference and model diagnosis of microarray data
title_fullStr β-empirical Bayes inference and model diagnosis of microarray data
title_full_unstemmed β-empirical Bayes inference and model diagnosis of microarray data
title_short β-empirical Bayes inference and model diagnosis of microarray data
title_sort β-empirical bayes inference and model diagnosis of microarray data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3464654/
https://www.ncbi.nlm.nih.gov/pubmed/22713095
http://dx.doi.org/10.1186/1471-2105-13-135
work_keys_str_mv AT hossainmollahmohammadmanir bempiricalbayesinferenceandmodeldiagnosisofmicroarraydata
AT haquemollahmnurul bempiricalbayesinferenceandmodeldiagnosisofmicroarraydata
AT kishinohirohisa bempiricalbayesinferenceandmodeldiagnosisofmicroarraydata