Cargando…

A new method for identifying bivariate differential expression in high dimensional microarray data using quadratic discriminant analysis

BACKGROUND: One of the drawbacks we face up when analyzing gene to phenotype associations in genomic data is the ugly performance of the designed classifier due to the small sample-high dimensional data structures (n ≪ p) at hand. This is known as the peaking phenomenon, a common situation in the an...

Descripción completa

Detalles Bibliográficos
Autores principales: Arevalillo, Jorge M, Navarro, Hilario
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3247087/
https://www.ncbi.nlm.nih.gov/pubmed/22168481
http://dx.doi.org/10.1186/1471-2105-12-S12-S6
_version_ 1782220039295860736
author Arevalillo, Jorge M
Navarro, Hilario
author_facet Arevalillo, Jorge M
Navarro, Hilario
author_sort Arevalillo, Jorge M
collection PubMed
description BACKGROUND: One of the drawbacks we face up when analyzing gene to phenotype associations in genomic data is the ugly performance of the designed classifier due to the small sample-high dimensional data structures (n ≪ p) at hand. This is known as the peaking phenomenon, a common situation in the analysis of gene expression data. Highly predictive bivariate gene interactions whose marginals are useless for discrimination are also affected by such phenomenon, so they are commonly discarded by state of the art sequential search algorithms. Such patterns are known as weak/marginal strong bivariate interactions. This paper addresses the problem of uncovering them in high dimensional settings. RESULTS: We propose a new approach which uses the quadratic discriminant analysis (QDA) as a search engine in order to detect such signals. The choice of QDA is justified by a simulation study for a benchmark of classifiers which reveals its appealing properties. The procedure rests on an exhaustive search which explores the feature space in a blockwise manner by dividing it in blocks and by assessing the accuracy of the QDA for the predictors within each pair of blocks; the block size is determined by the resistance of the QDA to peaking. This search highlights chunks of features which are expected to contain the type of subtle interactions we are concerned with; a closer look at this smaller subset of features by means of an exhaustive search guided by the QDA error rate for all the pairwise input combinations within this subset will enable their final detection. The proposed method is applied both to synthetic data and to a public domain microarray data. When applied to gene expression data, it leads to pairs of genes which are not univariate differentially expressed but exhibit subtle patterns of bivariate differential expression. CONCLUSIONS: We have proposed a novel approach for identifying weak marginal/strong bivariate interactions. Unlike standard approaches as the top scoring pair (TSP) and the CorScor, our procedure does not assume a specified shape of phenotype separation and may enrich the type of bivariate differential expression patterns that can be uncovered in high dimensional data.
format Online
Article
Text
id pubmed-3247087
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32470872011-12-29 A new method for identifying bivariate differential expression in high dimensional microarray data using quadratic discriminant analysis Arevalillo, Jorge M Navarro, Hilario BMC Bioinformatics Proceedings BACKGROUND: One of the drawbacks we face up when analyzing gene to phenotype associations in genomic data is the ugly performance of the designed classifier due to the small sample-high dimensional data structures (n ≪ p) at hand. This is known as the peaking phenomenon, a common situation in the analysis of gene expression data. Highly predictive bivariate gene interactions whose marginals are useless for discrimination are also affected by such phenomenon, so they are commonly discarded by state of the art sequential search algorithms. Such patterns are known as weak/marginal strong bivariate interactions. This paper addresses the problem of uncovering them in high dimensional settings. RESULTS: We propose a new approach which uses the quadratic discriminant analysis (QDA) as a search engine in order to detect such signals. The choice of QDA is justified by a simulation study for a benchmark of classifiers which reveals its appealing properties. The procedure rests on an exhaustive search which explores the feature space in a blockwise manner by dividing it in blocks and by assessing the accuracy of the QDA for the predictors within each pair of blocks; the block size is determined by the resistance of the QDA to peaking. This search highlights chunks of features which are expected to contain the type of subtle interactions we are concerned with; a closer look at this smaller subset of features by means of an exhaustive search guided by the QDA error rate for all the pairwise input combinations within this subset will enable their final detection. The proposed method is applied both to synthetic data and to a public domain microarray data. When applied to gene expression data, it leads to pairs of genes which are not univariate differentially expressed but exhibit subtle patterns of bivariate differential expression. CONCLUSIONS: We have proposed a novel approach for identifying weak marginal/strong bivariate interactions. Unlike standard approaches as the top scoring pair (TSP) and the CorScor, our procedure does not assume a specified shape of phenotype separation and may enrich the type of bivariate differential expression patterns that can be uncovered in high dimensional data. BioMed Central 2011-11-24 /pmc/articles/PMC3247087/ /pubmed/22168481 http://dx.doi.org/10.1186/1471-2105-12-S12-S6 Text en Copyright ©2011 Arevalillo and Navarro; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Arevalillo, Jorge M
Navarro, Hilario
A new method for identifying bivariate differential expression in high dimensional microarray data using quadratic discriminant analysis
title A new method for identifying bivariate differential expression in high dimensional microarray data using quadratic discriminant analysis
title_full A new method for identifying bivariate differential expression in high dimensional microarray data using quadratic discriminant analysis
title_fullStr A new method for identifying bivariate differential expression in high dimensional microarray data using quadratic discriminant analysis
title_full_unstemmed A new method for identifying bivariate differential expression in high dimensional microarray data using quadratic discriminant analysis
title_short A new method for identifying bivariate differential expression in high dimensional microarray data using quadratic discriminant analysis
title_sort new method for identifying bivariate differential expression in high dimensional microarray data using quadratic discriminant analysis
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3247087/
https://www.ncbi.nlm.nih.gov/pubmed/22168481
http://dx.doi.org/10.1186/1471-2105-12-S12-S6
work_keys_str_mv AT arevalillojorgem anewmethodforidentifyingbivariatedifferentialexpressioninhighdimensionalmicroarraydatausingquadraticdiscriminantanalysis
AT navarrohilario anewmethodforidentifyingbivariatedifferentialexpressioninhighdimensionalmicroarraydatausingquadraticdiscriminantanalysis
AT arevalillojorgem newmethodforidentifyingbivariatedifferentialexpressioninhighdimensionalmicroarraydatausingquadraticdiscriminantanalysis
AT navarrohilario newmethodforidentifyingbivariatedifferentialexpressioninhighdimensionalmicroarraydatausingquadraticdiscriminantanalysis