Cargando…
A new method for identifying bivariate differential expression in high dimensional microarray data using quadratic discriminant analysis
BACKGROUND: One of the drawbacks we face up when analyzing gene to phenotype associations in genomic data is the ugly performance of the designed classifier due to the small sample-high dimensional data structures (n ≪ p) at hand. This is known as the peaking phenomenon, a common situation in the an...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3247087/ https://www.ncbi.nlm.nih.gov/pubmed/22168481 http://dx.doi.org/10.1186/1471-2105-12-S12-S6 |
_version_ | 1782220039295860736 |
---|---|
author | Arevalillo, Jorge M Navarro, Hilario |
author_facet | Arevalillo, Jorge M Navarro, Hilario |
author_sort | Arevalillo, Jorge M |
collection | PubMed |
description | BACKGROUND: One of the drawbacks we face up when analyzing gene to phenotype associations in genomic data is the ugly performance of the designed classifier due to the small sample-high dimensional data structures (n ≪ p) at hand. This is known as the peaking phenomenon, a common situation in the analysis of gene expression data. Highly predictive bivariate gene interactions whose marginals are useless for discrimination are also affected by such phenomenon, so they are commonly discarded by state of the art sequential search algorithms. Such patterns are known as weak/marginal strong bivariate interactions. This paper addresses the problem of uncovering them in high dimensional settings. RESULTS: We propose a new approach which uses the quadratic discriminant analysis (QDA) as a search engine in order to detect such signals. The choice of QDA is justified by a simulation study for a benchmark of classifiers which reveals its appealing properties. The procedure rests on an exhaustive search which explores the feature space in a blockwise manner by dividing it in blocks and by assessing the accuracy of the QDA for the predictors within each pair of blocks; the block size is determined by the resistance of the QDA to peaking. This search highlights chunks of features which are expected to contain the type of subtle interactions we are concerned with; a closer look at this smaller subset of features by means of an exhaustive search guided by the QDA error rate for all the pairwise input combinations within this subset will enable their final detection. The proposed method is applied both to synthetic data and to a public domain microarray data. When applied to gene expression data, it leads to pairs of genes which are not univariate differentially expressed but exhibit subtle patterns of bivariate differential expression. CONCLUSIONS: We have proposed a novel approach for identifying weak marginal/strong bivariate interactions. Unlike standard approaches as the top scoring pair (TSP) and the CorScor, our procedure does not assume a specified shape of phenotype separation and may enrich the type of bivariate differential expression patterns that can be uncovered in high dimensional data. |
format | Online Article Text |
id | pubmed-3247087 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-32470872011-12-29 A new method for identifying bivariate differential expression in high dimensional microarray data using quadratic discriminant analysis Arevalillo, Jorge M Navarro, Hilario BMC Bioinformatics Proceedings BACKGROUND: One of the drawbacks we face up when analyzing gene to phenotype associations in genomic data is the ugly performance of the designed classifier due to the small sample-high dimensional data structures (n ≪ p) at hand. This is known as the peaking phenomenon, a common situation in the analysis of gene expression data. Highly predictive bivariate gene interactions whose marginals are useless for discrimination are also affected by such phenomenon, so they are commonly discarded by state of the art sequential search algorithms. Such patterns are known as weak/marginal strong bivariate interactions. This paper addresses the problem of uncovering them in high dimensional settings. RESULTS: We propose a new approach which uses the quadratic discriminant analysis (QDA) as a search engine in order to detect such signals. The choice of QDA is justified by a simulation study for a benchmark of classifiers which reveals its appealing properties. The procedure rests on an exhaustive search which explores the feature space in a blockwise manner by dividing it in blocks and by assessing the accuracy of the QDA for the predictors within each pair of blocks; the block size is determined by the resistance of the QDA to peaking. This search highlights chunks of features which are expected to contain the type of subtle interactions we are concerned with; a closer look at this smaller subset of features by means of an exhaustive search guided by the QDA error rate for all the pairwise input combinations within this subset will enable their final detection. The proposed method is applied both to synthetic data and to a public domain microarray data. When applied to gene expression data, it leads to pairs of genes which are not univariate differentially expressed but exhibit subtle patterns of bivariate differential expression. CONCLUSIONS: We have proposed a novel approach for identifying weak marginal/strong bivariate interactions. Unlike standard approaches as the top scoring pair (TSP) and the CorScor, our procedure does not assume a specified shape of phenotype separation and may enrich the type of bivariate differential expression patterns that can be uncovered in high dimensional data. BioMed Central 2011-11-24 /pmc/articles/PMC3247087/ /pubmed/22168481 http://dx.doi.org/10.1186/1471-2105-12-S12-S6 Text en Copyright ©2011 Arevalillo and Navarro; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Arevalillo, Jorge M Navarro, Hilario A new method for identifying bivariate differential expression in high dimensional microarray data using quadratic discriminant analysis |
title | A new method for identifying bivariate differential expression in high dimensional microarray data using quadratic discriminant analysis |
title_full | A new method for identifying bivariate differential expression in high dimensional microarray data using quadratic discriminant analysis |
title_fullStr | A new method for identifying bivariate differential expression in high dimensional microarray data using quadratic discriminant analysis |
title_full_unstemmed | A new method for identifying bivariate differential expression in high dimensional microarray data using quadratic discriminant analysis |
title_short | A new method for identifying bivariate differential expression in high dimensional microarray data using quadratic discriminant analysis |
title_sort | new method for identifying bivariate differential expression in high dimensional microarray data using quadratic discriminant analysis |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3247087/ https://www.ncbi.nlm.nih.gov/pubmed/22168481 http://dx.doi.org/10.1186/1471-2105-12-S12-S6 |
work_keys_str_mv | AT arevalillojorgem anewmethodforidentifyingbivariatedifferentialexpressioninhighdimensionalmicroarraydatausingquadraticdiscriminantanalysis AT navarrohilario anewmethodforidentifyingbivariatedifferentialexpressioninhighdimensionalmicroarraydatausingquadraticdiscriminantanalysis AT arevalillojorgem newmethodforidentifyingbivariatedifferentialexpressioninhighdimensionalmicroarraydatausingquadraticdiscriminantanalysis AT navarrohilario newmethodforidentifyingbivariatedifferentialexpressioninhighdimensionalmicroarraydatausingquadraticdiscriminantanalysis |