Cargando…

A framework for significance analysis of gene expression data using dimension reduction methods

BACKGROUND: The most popular methods for significance analysis on microarray data are well suited to find genes differentially expressed across predefined categories. However, identification of features that correlate with continuous dependent variables is more difficult using these methods, and lon...

Descripción completa

Detalles Bibliográficos
Autores principales: Gidskehaug, Lars, Anderssen, Endre, Flatberg, Arnar, Alsberg, Bjørn K
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2194745/
https://www.ncbi.nlm.nih.gov/pubmed/17877799
http://dx.doi.org/10.1186/1471-2105-8-346
_version_ 1782147687560249344
author Gidskehaug, Lars
Anderssen, Endre
Flatberg, Arnar
Alsberg, Bjørn K
author_facet Gidskehaug, Lars
Anderssen, Endre
Flatberg, Arnar
Alsberg, Bjørn K
author_sort Gidskehaug, Lars
collection PubMed
description BACKGROUND: The most popular methods for significance analysis on microarray data are well suited to find genes differentially expressed across predefined categories. However, identification of features that correlate with continuous dependent variables is more difficult using these methods, and long lists of significant genes returned are not easily probed for co-regulations and dependencies. Dimension reduction methods are much used in the microarray literature for classification or for obtaining low-dimensional representations of data sets. These methods have an additional interpretation strength that is often not fully exploited when expression data are analysed. In addition, significance analysis may be performed directly on the model parameters to find genes that are important for any number of categorical or continuous responses. We introduce a general scheme for analysis of expression data that combines significance testing with the interpretative advantages of the dimension reduction methods. This approach is applicable both for explorative analysis and for classification and regression problems. RESULTS: Three public data sets are analysed. One is used for classification, one contains spiked-in transcripts of known concentrations, and one represents a regression problem with several measured responses. Model-based significance analysis is performed using a modified version of Hotelling's T(2)-test, and a false discovery rate significance level is estimated by resampling. Our results show that underlying biological phenomena and unknown relationships in the data can be detected by a simple visual interpretation of the model parameters. It is also found that measured phenotypic responses may model the expression data more accurately than if the design-parameters are used as input. For the classification data, our method finds much the same genes as the standard methods, in addition to some extra which are shown to be biologically relevant. The list of spiked-in genes is also reproduced with high accuracy. CONCLUSION: The dimension reduction methods are versatile tools that may also be used for significance testing. Visual inspection of model components is useful for interpretation, and the methodology is the same whether the goal is classification, prediction of responses, feature selection or exploration of a data set. The presented framework is conceptually and algorithmically simple, and a Matlab toolbox (Mathworks Inc, USA) is supplemented.
format Text
id pubmed-2194745
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-21947452008-01-14 A framework for significance analysis of gene expression data using dimension reduction methods Gidskehaug, Lars Anderssen, Endre Flatberg, Arnar Alsberg, Bjørn K BMC Bioinformatics Research Article BACKGROUND: The most popular methods for significance analysis on microarray data are well suited to find genes differentially expressed across predefined categories. However, identification of features that correlate with continuous dependent variables is more difficult using these methods, and long lists of significant genes returned are not easily probed for co-regulations and dependencies. Dimension reduction methods are much used in the microarray literature for classification or for obtaining low-dimensional representations of data sets. These methods have an additional interpretation strength that is often not fully exploited when expression data are analysed. In addition, significance analysis may be performed directly on the model parameters to find genes that are important for any number of categorical or continuous responses. We introduce a general scheme for analysis of expression data that combines significance testing with the interpretative advantages of the dimension reduction methods. This approach is applicable both for explorative analysis and for classification and regression problems. RESULTS: Three public data sets are analysed. One is used for classification, one contains spiked-in transcripts of known concentrations, and one represents a regression problem with several measured responses. Model-based significance analysis is performed using a modified version of Hotelling's T(2)-test, and a false discovery rate significance level is estimated by resampling. Our results show that underlying biological phenomena and unknown relationships in the data can be detected by a simple visual interpretation of the model parameters. It is also found that measured phenotypic responses may model the expression data more accurately than if the design-parameters are used as input. For the classification data, our method finds much the same genes as the standard methods, in addition to some extra which are shown to be biologically relevant. The list of spiked-in genes is also reproduced with high accuracy. CONCLUSION: The dimension reduction methods are versatile tools that may also be used for significance testing. Visual inspection of model components is useful for interpretation, and the methodology is the same whether the goal is classification, prediction of responses, feature selection or exploration of a data set. The presented framework is conceptually and algorithmically simple, and a Matlab toolbox (Mathworks Inc, USA) is supplemented. BioMed Central 2007-09-18 /pmc/articles/PMC2194745/ /pubmed/17877799 http://dx.doi.org/10.1186/1471-2105-8-346 Text en Copyright © 2007 Gidskehaug et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Gidskehaug, Lars
Anderssen, Endre
Flatberg, Arnar
Alsberg, Bjørn K
A framework for significance analysis of gene expression data using dimension reduction methods
title A framework for significance analysis of gene expression data using dimension reduction methods
title_full A framework for significance analysis of gene expression data using dimension reduction methods
title_fullStr A framework for significance analysis of gene expression data using dimension reduction methods
title_full_unstemmed A framework for significance analysis of gene expression data using dimension reduction methods
title_short A framework for significance analysis of gene expression data using dimension reduction methods
title_sort framework for significance analysis of gene expression data using dimension reduction methods
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2194745/
https://www.ncbi.nlm.nih.gov/pubmed/17877799
http://dx.doi.org/10.1186/1471-2105-8-346
work_keys_str_mv AT gidskehauglars aframeworkforsignificanceanalysisofgeneexpressiondatausingdimensionreductionmethods
AT anderssenendre aframeworkforsignificanceanalysisofgeneexpressiondatausingdimensionreductionmethods
AT flatbergarnar aframeworkforsignificanceanalysisofgeneexpressiondatausingdimensionreductionmethods
AT alsbergbjørnk aframeworkforsignificanceanalysisofgeneexpressiondatausingdimensionreductionmethods
AT gidskehauglars frameworkforsignificanceanalysisofgeneexpressiondatausingdimensionreductionmethods
AT anderssenendre frameworkforsignificanceanalysisofgeneexpressiondatausingdimensionreductionmethods
AT flatbergarnar frameworkforsignificanceanalysisofgeneexpressiondatausingdimensionreductionmethods
AT alsbergbjørnk frameworkforsignificanceanalysisofgeneexpressiondatausingdimensionreductionmethods