Cargando…

Independent component analysis of Alzheimer's DNA microarray gene expression data

BACKGROUND: Gene microarray technology is an effective tool to investigate the simultaneous activity of multiple cellular pathways from hundreds to thousands of genes. However, because data in the colossal amounts generated by DNA microarray technology are usually complex, noisy, high-dimensional, a...

Descripción completa

Detalles Bibliográficos
Autores principales: Kong, Wei, Mou, Xiaoyang, Liu, Qingzhong, Chen, Zhongxue, Vanderburg, Charles R, Rogers, Jack T, Huang, Xudong
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2646728/
https://www.ncbi.nlm.nih.gov/pubmed/19173745
http://dx.doi.org/10.1186/1750-1326-4-5
_version_ 1782164885677801472
author Kong, Wei
Mou, Xiaoyang
Liu, Qingzhong
Chen, Zhongxue
Vanderburg, Charles R
Rogers, Jack T
Huang, Xudong
author_facet Kong, Wei
Mou, Xiaoyang
Liu, Qingzhong
Chen, Zhongxue
Vanderburg, Charles R
Rogers, Jack T
Huang, Xudong
author_sort Kong, Wei
collection PubMed
description BACKGROUND: Gene microarray technology is an effective tool to investigate the simultaneous activity of multiple cellular pathways from hundreds to thousands of genes. However, because data in the colossal amounts generated by DNA microarray technology are usually complex, noisy, high-dimensional, and often hindered by low statistical power, their exploitation is difficult. To overcome these problems, two kinds of unsupervised analysis methods for microarray data: principal component analysis (PCA) and independent component analysis (ICA) have been developed to accomplish the task. PCA projects the data into a new space spanned by the principal components that are mutually orthonormal to each other. The constraint of mutual orthogonality and second-order statistics technique within PCA algorithms, however, may not be applied to the biological systems studied. Extracting and characterizing the most informative features of the biological signals, however, require higher-order statistics. RESULTS: ICA is one of the unsupervised algorithms that can extract higher-order statistical structures from data and has been applied to DNA microarray gene expression data analysis. We performed FastICA method on DNA microarray gene expression data from Alzheimer's disease (AD) hippocampal tissue samples and consequential gene clustering. Experimental results showed that the ICA method can improve the clustering results of AD samples and identify significant genes. More than 50 significant genes with high expression levels in severe AD were extracted, representing immunity-related protein, metal-related protein, membrane protein, lipoprotein, neuropeptide, cytoskeleton protein, cellular binding protein, and ribosomal protein. Within the aforementioned categories, our method also found 37 significant genes with low expression levels. Moreover, it is worth noting that some oncogenes and phosphorylation-related proteins are expressed in low levels. In comparison to the PCA and support vector machine recursive feature elimination (SVM-RFE) methods, which are widely used in microarray data analysis, ICA can identify more AD-related genes. Furthermore, we have validated and identified many genes that are associated with AD pathogenesis. CONCLUSION: We demonstrated that ICA exploits higher-order statistics to identify gene expression profiles as linear combinations of elementary expression patterns that lead to the construction of potential AD-related pathogenic pathways. Our computing results also validated that the ICA model outperformed PCA and the SVM-RFE method. This report shows that ICA as a microarray data analysis tool can help us to elucidate the molecular taxonomy of AD and other multifactorial and polygenic complex diseases.
format Text
id pubmed-2646728
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26467282009-02-24 Independent component analysis of Alzheimer's DNA microarray gene expression data Kong, Wei Mou, Xiaoyang Liu, Qingzhong Chen, Zhongxue Vanderburg, Charles R Rogers, Jack T Huang, Xudong Mol Neurodegener Methodology BACKGROUND: Gene microarray technology is an effective tool to investigate the simultaneous activity of multiple cellular pathways from hundreds to thousands of genes. However, because data in the colossal amounts generated by DNA microarray technology are usually complex, noisy, high-dimensional, and often hindered by low statistical power, their exploitation is difficult. To overcome these problems, two kinds of unsupervised analysis methods for microarray data: principal component analysis (PCA) and independent component analysis (ICA) have been developed to accomplish the task. PCA projects the data into a new space spanned by the principal components that are mutually orthonormal to each other. The constraint of mutual orthogonality and second-order statistics technique within PCA algorithms, however, may not be applied to the biological systems studied. Extracting and characterizing the most informative features of the biological signals, however, require higher-order statistics. RESULTS: ICA is one of the unsupervised algorithms that can extract higher-order statistical structures from data and has been applied to DNA microarray gene expression data analysis. We performed FastICA method on DNA microarray gene expression data from Alzheimer's disease (AD) hippocampal tissue samples and consequential gene clustering. Experimental results showed that the ICA method can improve the clustering results of AD samples and identify significant genes. More than 50 significant genes with high expression levels in severe AD were extracted, representing immunity-related protein, metal-related protein, membrane protein, lipoprotein, neuropeptide, cytoskeleton protein, cellular binding protein, and ribosomal protein. Within the aforementioned categories, our method also found 37 significant genes with low expression levels. Moreover, it is worth noting that some oncogenes and phosphorylation-related proteins are expressed in low levels. In comparison to the PCA and support vector machine recursive feature elimination (SVM-RFE) methods, which are widely used in microarray data analysis, ICA can identify more AD-related genes. Furthermore, we have validated and identified many genes that are associated with AD pathogenesis. CONCLUSION: We demonstrated that ICA exploits higher-order statistics to identify gene expression profiles as linear combinations of elementary expression patterns that lead to the construction of potential AD-related pathogenic pathways. Our computing results also validated that the ICA model outperformed PCA and the SVM-RFE method. This report shows that ICA as a microarray data analysis tool can help us to elucidate the molecular taxonomy of AD and other multifactorial and polygenic complex diseases. BioMed Central 2009-01-28 /pmc/articles/PMC2646728/ /pubmed/19173745 http://dx.doi.org/10.1186/1750-1326-4-5 Text en Copyright © 2009 Kong et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology
Kong, Wei
Mou, Xiaoyang
Liu, Qingzhong
Chen, Zhongxue
Vanderburg, Charles R
Rogers, Jack T
Huang, Xudong
Independent component analysis of Alzheimer's DNA microarray gene expression data
title Independent component analysis of Alzheimer's DNA microarray gene expression data
title_full Independent component analysis of Alzheimer's DNA microarray gene expression data
title_fullStr Independent component analysis of Alzheimer's DNA microarray gene expression data
title_full_unstemmed Independent component analysis of Alzheimer's DNA microarray gene expression data
title_short Independent component analysis of Alzheimer's DNA microarray gene expression data
title_sort independent component analysis of alzheimer's dna microarray gene expression data
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2646728/
https://www.ncbi.nlm.nih.gov/pubmed/19173745
http://dx.doi.org/10.1186/1750-1326-4-5
work_keys_str_mv AT kongwei independentcomponentanalysisofalzheimersdnamicroarraygeneexpressiondata
AT mouxiaoyang independentcomponentanalysisofalzheimersdnamicroarraygeneexpressiondata
AT liuqingzhong independentcomponentanalysisofalzheimersdnamicroarraygeneexpressiondata
AT chenzhongxue independentcomponentanalysisofalzheimersdnamicroarraygeneexpressiondata
AT vanderburgcharlesr independentcomponentanalysisofalzheimersdnamicroarraygeneexpressiondata
AT rogersjackt independentcomponentanalysisofalzheimersdnamicroarraygeneexpressiondata
AT huangxudong independentcomponentanalysisofalzheimersdnamicroarraygeneexpressiondata