Cargando…

Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data

BACKGROUND: Visualization of DNA microarray data in two or three dimensional spaces is an important exploratory analysis step in order to detect quality issues or to generate new hypotheses. Principal Component Analysis (PCA) is a widely used linear method to define the mapping between the high-dime...

Descripción completa

Detalles Bibliográficos
Autores principales: Bartenhagen, Christoph, Klein, Hans-Ulrich, Ruckert, Christian, Jiang, Xiaoyi, Dugas, Martin
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2998530/
https://www.ncbi.nlm.nih.gov/pubmed/21087509
http://dx.doi.org/10.1186/1471-2105-11-567
_version_ 1782193379449241600
author Bartenhagen, Christoph
Klein, Hans-Ulrich
Ruckert, Christian
Jiang, Xiaoyi
Dugas, Martin
author_facet Bartenhagen, Christoph
Klein, Hans-Ulrich
Ruckert, Christian
Jiang, Xiaoyi
Dugas, Martin
author_sort Bartenhagen, Christoph
collection PubMed
description BACKGROUND: Visualization of DNA microarray data in two or three dimensional spaces is an important exploratory analysis step in order to detect quality issues or to generate new hypotheses. Principal Component Analysis (PCA) is a widely used linear method to define the mapping between the high-dimensional data and its low-dimensional representation. During the last decade, many new nonlinear methods for dimension reduction have been proposed, but it is still unclear how well these methods capture the underlying structure of microarray gene expression data. In this study, we assessed the performance of the PCA approach and of six nonlinear dimension reduction methods, namely Kernel PCA, Locally Linear Embedding, Isomap, Diffusion Maps, Laplacian Eigenmaps and Maximum Variance Unfolding, in terms of visualization of microarray data. RESULTS: A systematic benchmark, consisting of Support Vector Machine classification, cluster validation and noise evaluations was applied to ten microarray and several simulated datasets. Significant differences between PCA and most of the nonlinear methods were observed in two and three dimensional target spaces. With an increasing number of dimensions and an increasing number of differentially expressed genes, all methods showed similar performance. PCA and Diffusion Maps responded less sensitive to noise than the other nonlinear methods. CONCLUSIONS: Locally Linear Embedding and Isomap showed a superior performance on all datasets. In very low-dimensional representations and with few differentially expressed genes, these two methods preserve more of the underlying structure of the data than PCA, and thus are favorable alternatives for the visualization of microarray data.
format Text
id pubmed-2998530
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29985302011-01-05 Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data Bartenhagen, Christoph Klein, Hans-Ulrich Ruckert, Christian Jiang, Xiaoyi Dugas, Martin BMC Bioinformatics Research Article BACKGROUND: Visualization of DNA microarray data in two or three dimensional spaces is an important exploratory analysis step in order to detect quality issues or to generate new hypotheses. Principal Component Analysis (PCA) is a widely used linear method to define the mapping between the high-dimensional data and its low-dimensional representation. During the last decade, many new nonlinear methods for dimension reduction have been proposed, but it is still unclear how well these methods capture the underlying structure of microarray gene expression data. In this study, we assessed the performance of the PCA approach and of six nonlinear dimension reduction methods, namely Kernel PCA, Locally Linear Embedding, Isomap, Diffusion Maps, Laplacian Eigenmaps and Maximum Variance Unfolding, in terms of visualization of microarray data. RESULTS: A systematic benchmark, consisting of Support Vector Machine classification, cluster validation and noise evaluations was applied to ten microarray and several simulated datasets. Significant differences between PCA and most of the nonlinear methods were observed in two and three dimensional target spaces. With an increasing number of dimensions and an increasing number of differentially expressed genes, all methods showed similar performance. PCA and Diffusion Maps responded less sensitive to noise than the other nonlinear methods. CONCLUSIONS: Locally Linear Embedding and Isomap showed a superior performance on all datasets. In very low-dimensional representations and with few differentially expressed genes, these two methods preserve more of the underlying structure of the data than PCA, and thus are favorable alternatives for the visualization of microarray data. BioMed Central 2010-11-18 /pmc/articles/PMC2998530/ /pubmed/21087509 http://dx.doi.org/10.1186/1471-2105-11-567 Text en Copyright ©2010 Bartenhagen et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Bartenhagen, Christoph
Klein, Hans-Ulrich
Ruckert, Christian
Jiang, Xiaoyi
Dugas, Martin
Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data
title Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data
title_full Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data
title_fullStr Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data
title_full_unstemmed Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data
title_short Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data
title_sort comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2998530/
https://www.ncbi.nlm.nih.gov/pubmed/21087509
http://dx.doi.org/10.1186/1471-2105-11-567
work_keys_str_mv AT bartenhagenchristoph comparativestudyofunsuperviseddimensionreductiontechniquesforthevisualizationofmicroarraygeneexpressiondata
AT kleinhansulrich comparativestudyofunsuperviseddimensionreductiontechniquesforthevisualizationofmicroarraygeneexpressiondata
AT ruckertchristian comparativestudyofunsuperviseddimensionreductiontechniquesforthevisualizationofmicroarraygeneexpressiondata
AT jiangxiaoyi comparativestudyofunsuperviseddimensionreductiontechniquesforthevisualizationofmicroarraygeneexpressiondata
AT dugasmartin comparativestudyofunsuperviseddimensionreductiontechniquesforthevisualizationofmicroarraygeneexpressiondata