Cargando…
β-empirical Bayes inference and model diagnosis of microarray data
BACKGROUND: Microarray data enables the high-throughput survey of mRNA expression profiles at the genomic level; however, the data presents a challenging statistical problem because of the large number of transcripts with small sample sizes that are obtained. To reduce the dimensionality, various Ba...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3464654/ https://www.ncbi.nlm.nih.gov/pubmed/22713095 http://dx.doi.org/10.1186/1471-2105-13-135 |
_version_ | 1782245445919047680 |
---|---|
author | Hossain Mollah, Mohammad Manir Haque Mollah, M Nurul Kishino, Hirohisa |
author_facet | Hossain Mollah, Mohammad Manir Haque Mollah, M Nurul Kishino, Hirohisa |
author_sort | Hossain Mollah, Mohammad Manir |
collection | PubMed |
description | BACKGROUND: Microarray data enables the high-throughput survey of mRNA expression profiles at the genomic level; however, the data presents a challenging statistical problem because of the large number of transcripts with small sample sizes that are obtained. To reduce the dimensionality, various Bayesian or empirical Bayes hierarchical models have been developed. However, because of the complexity of the microarray data, no model can explain the data fully. It is generally difficult to scrutinize the irregular patterns of expression that are not expected by the usual statistical gene by gene models. RESULTS: As an extension of empirical Bayes (EB) procedures, we have developed the β-empirical Bayes (β-EB) approach based on a β-likelihood measure which can be regarded as an ’evidence-based’ weighted (quasi-) likelihood inference. The weight of a transcript t is described as a power function of its likelihood, f(β)(y(t)|θ). Genes with low likelihoods have unexpected expression patterns and low weights. By assigning low weights to outliers, the inference becomes robust. The value of β, which controls the balance between the robustness and efficiency, is selected by maximizing the predictive β(0)-likelihood by cross-validation. The proposed β-EB approach identified six significant (p<10(−5)) contaminated transcripts as differentially expressed (DE) in normal/tumor tissues from the head and neck of cancer patients. These six genes were all confirmed to be related to cancer; they were not identified as DE genes by the classical EB approach. When applied to the eQTL analysis of Arabidopsis thaliana, the proposed β-EB approach identified some potential master regulators that were missed by the EB approach. CONCLUSIONS: The simulation data and real gene expression data showed that the proposed β-EB method was robust against outliers. The distribution of the weights was used to scrutinize the irregular patterns of expression and diagnose the model statistically. When β-weights outside the range of the predicted distribution were observed, a detailed inspection of the data was carried out. The β-weights described here can be applied to other likelihood-based statistical models for diagnosis, and may serve as a useful tool for transcriptome and proteome studies. |
format | Online Article Text |
id | pubmed-3464654 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-34646542012-10-05 β-empirical Bayes inference and model diagnosis of microarray data Hossain Mollah, Mohammad Manir Haque Mollah, M Nurul Kishino, Hirohisa BMC Bioinformatics Methodology Article BACKGROUND: Microarray data enables the high-throughput survey of mRNA expression profiles at the genomic level; however, the data presents a challenging statistical problem because of the large number of transcripts with small sample sizes that are obtained. To reduce the dimensionality, various Bayesian or empirical Bayes hierarchical models have been developed. However, because of the complexity of the microarray data, no model can explain the data fully. It is generally difficult to scrutinize the irregular patterns of expression that are not expected by the usual statistical gene by gene models. RESULTS: As an extension of empirical Bayes (EB) procedures, we have developed the β-empirical Bayes (β-EB) approach based on a β-likelihood measure which can be regarded as an ’evidence-based’ weighted (quasi-) likelihood inference. The weight of a transcript t is described as a power function of its likelihood, f(β)(y(t)|θ). Genes with low likelihoods have unexpected expression patterns and low weights. By assigning low weights to outliers, the inference becomes robust. The value of β, which controls the balance between the robustness and efficiency, is selected by maximizing the predictive β(0)-likelihood by cross-validation. The proposed β-EB approach identified six significant (p<10(−5)) contaminated transcripts as differentially expressed (DE) in normal/tumor tissues from the head and neck of cancer patients. These six genes were all confirmed to be related to cancer; they were not identified as DE genes by the classical EB approach. When applied to the eQTL analysis of Arabidopsis thaliana, the proposed β-EB approach identified some potential master regulators that were missed by the EB approach. CONCLUSIONS: The simulation data and real gene expression data showed that the proposed β-EB method was robust against outliers. The distribution of the weights was used to scrutinize the irregular patterns of expression and diagnose the model statistically. When β-weights outside the range of the predicted distribution were observed, a detailed inspection of the data was carried out. The β-weights described here can be applied to other likelihood-based statistical models for diagnosis, and may serve as a useful tool for transcriptome and proteome studies. BioMed Central 2012-06-19 /pmc/articles/PMC3464654/ /pubmed/22713095 http://dx.doi.org/10.1186/1471-2105-13-135 Text en Copyright ©2012 Mollah et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Hossain Mollah, Mohammad Manir Haque Mollah, M Nurul Kishino, Hirohisa β-empirical Bayes inference and model diagnosis of microarray data |
title | β-empirical Bayes inference and model diagnosis of microarray data |
title_full | β-empirical Bayes inference and model diagnosis of microarray data |
title_fullStr | β-empirical Bayes inference and model diagnosis of microarray data |
title_full_unstemmed | β-empirical Bayes inference and model diagnosis of microarray data |
title_short | β-empirical Bayes inference and model diagnosis of microarray data |
title_sort | β-empirical bayes inference and model diagnosis of microarray data |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3464654/ https://www.ncbi.nlm.nih.gov/pubmed/22713095 http://dx.doi.org/10.1186/1471-2105-13-135 |
work_keys_str_mv | AT hossainmollahmohammadmanir bempiricalbayesinferenceandmodeldiagnosisofmicroarraydata AT haquemollahmnurul bempiricalbayesinferenceandmodeldiagnosisofmicroarraydata AT kishinohirohisa bempiricalbayesinferenceandmodeldiagnosisofmicroarraydata |