Cargando…

Exploring matrix factorization techniques for significant genes identification of Alzheimer’s disease microarray gene expression data

ABSTRACT: BACKGROUND: The wide use of high-throughput DNA microarray technology provide an increasingly detailed view of human transcriptome from hundreds to thousands of genes. Although biomedical researchers typically design microarray experiments to explore specific biological contexts, the relat...

Descripción completa

Detalles Bibliográficos
Autores principales: Kong, Wei, Mou, Xiaoyang, Hu, Xiaohua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3203370/
https://www.ncbi.nlm.nih.gov/pubmed/21989140
http://dx.doi.org/10.1186/1471-2105-12-S5-S7
_version_ 1782215110442352640
author Kong, Wei
Mou, Xiaoyang
Hu, Xiaohua
author_facet Kong, Wei
Mou, Xiaoyang
Hu, Xiaohua
author_sort Kong, Wei
collection PubMed
description ABSTRACT: BACKGROUND: The wide use of high-throughput DNA microarray technology provide an increasingly detailed view of human transcriptome from hundreds to thousands of genes. Although biomedical researchers typically design microarray experiments to explore specific biological contexts, the relationships between genes are hard to identified because they are complex and noisy high-dimensional data and are often hindered by low statistical power. The main challenge now is to extract valuable biological information from the colossal amount of data to gain insight into biological processes and the mechanisms of human disease. To overcome the challenge requires mathematical and computational methods that are versatile enough to capture the underlying biological features and simple enough to be applied efficiently to large datasets. METHODS: Unsupervised machine learning approaches provide new and efficient analysis of gene expression profiles. In our study, two unsupervised knowledge-based matrix factorization methods, independent component analysis (ICA) and nonnegative matrix factorization (NMF) are integrated to identify significant genes and related pathways in microarray gene expression dataset of Alzheimer’s disease. The advantage of these two approaches is they can be performed as a biclustering method by which genes and conditions can be clustered simultaneously. Furthermore, they can group genes into different categories for identifying related diagnostic pathways and regulatory networks. The difference between these two method lies in ICA assume statistical independence of the expression modes, while NMF need positivity constrains to generate localized gene expression profiles. RESULTS: In our work, we performed FastICA and non-smooth NMF methods on DNA microarray gene expression data of Alzheimer’s disease respectively. The simulation results shows that both of the methods can clearly classify severe AD samples from control samples, and the biological analysis of the identified significant genes and their related pathways demonstrated that these genes play a prominent role in AD and relate the activation patterns to AD phenotypes. It is validated that the combination of these two methods is efficient. CONCLUSIONS: Unsupervised matrix factorization methods provide efficient tools to analyze high-throughput microarray dataset. According to the facts that different unsupervised approaches explore correlations in the high-dimensional data space and identify relevant subspace base on different hypotheses, integrating these methods to explore the underlying biological information from microarray dataset is an efficient approach. By combining the significant genes identified by both ICA and NMF, the biological analysis shows great efficient for elucidating the molecular taxonomy of Alzheimer’s disease and enable better experimental design to further identify potential pathways and therapeutic targets of AD.
format Online
Article
Text
id pubmed-3203370
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32033702011-10-29 Exploring matrix factorization techniques for significant genes identification of Alzheimer’s disease microarray gene expression data Kong, Wei Mou, Xiaoyang Hu, Xiaohua BMC Bioinformatics Proceedings ABSTRACT: BACKGROUND: The wide use of high-throughput DNA microarray technology provide an increasingly detailed view of human transcriptome from hundreds to thousands of genes. Although biomedical researchers typically design microarray experiments to explore specific biological contexts, the relationships between genes are hard to identified because they are complex and noisy high-dimensional data and are often hindered by low statistical power. The main challenge now is to extract valuable biological information from the colossal amount of data to gain insight into biological processes and the mechanisms of human disease. To overcome the challenge requires mathematical and computational methods that are versatile enough to capture the underlying biological features and simple enough to be applied efficiently to large datasets. METHODS: Unsupervised machine learning approaches provide new and efficient analysis of gene expression profiles. In our study, two unsupervised knowledge-based matrix factorization methods, independent component analysis (ICA) and nonnegative matrix factorization (NMF) are integrated to identify significant genes and related pathways in microarray gene expression dataset of Alzheimer’s disease. The advantage of these two approaches is they can be performed as a biclustering method by which genes and conditions can be clustered simultaneously. Furthermore, they can group genes into different categories for identifying related diagnostic pathways and regulatory networks. The difference between these two method lies in ICA assume statistical independence of the expression modes, while NMF need positivity constrains to generate localized gene expression profiles. RESULTS: In our work, we performed FastICA and non-smooth NMF methods on DNA microarray gene expression data of Alzheimer’s disease respectively. The simulation results shows that both of the methods can clearly classify severe AD samples from control samples, and the biological analysis of the identified significant genes and their related pathways demonstrated that these genes play a prominent role in AD and relate the activation patterns to AD phenotypes. It is validated that the combination of these two methods is efficient. CONCLUSIONS: Unsupervised matrix factorization methods provide efficient tools to analyze high-throughput microarray dataset. According to the facts that different unsupervised approaches explore correlations in the high-dimensional data space and identify relevant subspace base on different hypotheses, integrating these methods to explore the underlying biological information from microarray dataset is an efficient approach. By combining the significant genes identified by both ICA and NMF, the biological analysis shows great efficient for elucidating the molecular taxonomy of Alzheimer’s disease and enable better experimental design to further identify potential pathways and therapeutic targets of AD. BioMed Central 2011-07-27 /pmc/articles/PMC3203370/ /pubmed/21989140 http://dx.doi.org/10.1186/1471-2105-12-S5-S7 Text en Copyright ©2011 Kong et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Kong, Wei
Mou, Xiaoyang
Hu, Xiaohua
Exploring matrix factorization techniques for significant genes identification of Alzheimer’s disease microarray gene expression data
title Exploring matrix factorization techniques for significant genes identification of Alzheimer’s disease microarray gene expression data
title_full Exploring matrix factorization techniques for significant genes identification of Alzheimer’s disease microarray gene expression data
title_fullStr Exploring matrix factorization techniques for significant genes identification of Alzheimer’s disease microarray gene expression data
title_full_unstemmed Exploring matrix factorization techniques for significant genes identification of Alzheimer’s disease microarray gene expression data
title_short Exploring matrix factorization techniques for significant genes identification of Alzheimer’s disease microarray gene expression data
title_sort exploring matrix factorization techniques for significant genes identification of alzheimer’s disease microarray gene expression data
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3203370/
https://www.ncbi.nlm.nih.gov/pubmed/21989140
http://dx.doi.org/10.1186/1471-2105-12-S5-S7
work_keys_str_mv AT kongwei exploringmatrixfactorizationtechniquesforsignificantgenesidentificationofalzheimersdiseasemicroarraygeneexpressiondata
AT mouxiaoyang exploringmatrixfactorizationtechniquesforsignificantgenesidentificationofalzheimersdiseasemicroarraygeneexpressiondata
AT huxiaohua exploringmatrixfactorizationtechniquesforsignificantgenesidentificationofalzheimersdiseasemicroarraygeneexpressiondata