Cargando…
Query Large Scale Microarray Compendium Datasets Using a Model-Based Bayesian Approach with Variable Selection
In microarray gene expression data analysis, it is often of interest to identify genes that share similar expression profiles with a particular gene such as a key regulatory protein. Multiple studies have been conducted using various correlation measures to identify co-expressed genes. While working...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2637418/ https://www.ncbi.nlm.nih.gov/pubmed/19214232 http://dx.doi.org/10.1371/journal.pone.0004495 |
_version_ | 1782164354176647168 |
---|---|
author | Hu, Ming Qin, Zhaohui S. |
author_facet | Hu, Ming Qin, Zhaohui S. |
author_sort | Hu, Ming |
collection | PubMed |
description | In microarray gene expression data analysis, it is often of interest to identify genes that share similar expression profiles with a particular gene such as a key regulatory protein. Multiple studies have been conducted using various correlation measures to identify co-expressed genes. While working well for small datasets, the heterogeneity introduced from increased sample size inevitably reduces the sensitivity and specificity of these approaches. This is because most co-expression relationships do not extend to all experimental conditions. With the rapid increase in the size of microarray datasets, identifying functionally related genes from large and diverse microarray gene expression datasets is a key challenge. We develop a model-based gene expression query algorithm built under the Bayesian model selection framework. It is capable of detecting co-expression profiles under a subset of samples/experimental conditions. In addition, it allows linearly transformed expression patterns to be recognized and is robust against sporadic outliers in the data. Both features are critically important for increasing the power of identifying co-expressed genes in large scale gene expression datasets. Our simulation studies suggest that this method outperforms existing correlation coefficients or mutual information-based query tools. When we apply this new method to the Escherichia coli microarray compendium data, it identifies a majority of known regulons as well as novel potential target genes of numerous key transcription factors. |
format | Text |
id | pubmed-2637418 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2009 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-26374182009-02-13 Query Large Scale Microarray Compendium Datasets Using a Model-Based Bayesian Approach with Variable Selection Hu, Ming Qin, Zhaohui S. PLoS One Research Article In microarray gene expression data analysis, it is often of interest to identify genes that share similar expression profiles with a particular gene such as a key regulatory protein. Multiple studies have been conducted using various correlation measures to identify co-expressed genes. While working well for small datasets, the heterogeneity introduced from increased sample size inevitably reduces the sensitivity and specificity of these approaches. This is because most co-expression relationships do not extend to all experimental conditions. With the rapid increase in the size of microarray datasets, identifying functionally related genes from large and diverse microarray gene expression datasets is a key challenge. We develop a model-based gene expression query algorithm built under the Bayesian model selection framework. It is capable of detecting co-expression profiles under a subset of samples/experimental conditions. In addition, it allows linearly transformed expression patterns to be recognized and is robust against sporadic outliers in the data. Both features are critically important for increasing the power of identifying co-expressed genes in large scale gene expression datasets. Our simulation studies suggest that this method outperforms existing correlation coefficients or mutual information-based query tools. When we apply this new method to the Escherichia coli microarray compendium data, it identifies a majority of known regulons as well as novel potential target genes of numerous key transcription factors. Public Library of Science 2009-02-13 /pmc/articles/PMC2637418/ /pubmed/19214232 http://dx.doi.org/10.1371/journal.pone.0004495 Text en Hu et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Hu, Ming Qin, Zhaohui S. Query Large Scale Microarray Compendium Datasets Using a Model-Based Bayesian Approach with Variable Selection |
title | Query Large Scale Microarray Compendium Datasets Using a Model-Based Bayesian Approach with Variable Selection |
title_full | Query Large Scale Microarray Compendium Datasets Using a Model-Based Bayesian Approach with Variable Selection |
title_fullStr | Query Large Scale Microarray Compendium Datasets Using a Model-Based Bayesian Approach with Variable Selection |
title_full_unstemmed | Query Large Scale Microarray Compendium Datasets Using a Model-Based Bayesian Approach with Variable Selection |
title_short | Query Large Scale Microarray Compendium Datasets Using a Model-Based Bayesian Approach with Variable Selection |
title_sort | query large scale microarray compendium datasets using a model-based bayesian approach with variable selection |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2637418/ https://www.ncbi.nlm.nih.gov/pubmed/19214232 http://dx.doi.org/10.1371/journal.pone.0004495 |
work_keys_str_mv | AT huming querylargescalemicroarraycompendiumdatasetsusingamodelbasedbayesianapproachwithvariableselection AT qinzhaohuis querylargescalemicroarraycompendiumdatasetsusingamodelbasedbayesianapproachwithvariableselection |