Cargando…

Biclustering of gene expression data by non-smooth non-negative matrix factorization

BACKGROUND: The extended use of microarray technologies has enabled the generation and accumulation of gene expression datasets that contain expression levels of thousands of genes across tens or hundreds of different experimental conditions. One of the major challenges in the analysis of such datas...

Descripción completa

Detalles Bibliográficos
Autores principales:	Carmona-Saez, Pedro, Pascual-Marqui, Roberto D, Tirado, F, Carazo, Jose M, Pascual-Montano, Alberto
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1434777/ https://www.ncbi.nlm.nih.gov/pubmed/16503973 http://dx.doi.org/10.1186/1471-2105-7-78

_version_	1782127256477368320
author	Carmona-Saez, Pedro Pascual-Marqui, Roberto D Tirado, F Carazo, Jose M Pascual-Montano, Alberto
author_facet	Carmona-Saez, Pedro Pascual-Marqui, Roberto D Tirado, F Carazo, Jose M Pascual-Montano, Alberto
author_sort	Carmona-Saez, Pedro
collection	PubMed
description	BACKGROUND: The extended use of microarray technologies has enabled the generation and accumulation of gene expression datasets that contain expression levels of thousands of genes across tens or hundreds of different experimental conditions. One of the major challenges in the analysis of such datasets is to discover local structures composed by sets of genes that show coherent expression patterns across subsets of experimental conditions. These patterns may provide clues about the main biological processes associated to different physiological states. RESULTS: In this work we present a methodology able to cluster genes and conditions highly related in sub-portions of the data. Our approach is based on a new data mining technique, Non-smooth Non-Negative Matrix Factorization (nsNMF), able to identify localized patterns in large datasets. We assessed the potential of this methodology analyzing several synthetic datasets as well as two large and heterogeneous sets of gene expression profiles. In all cases the method was able to identify localized features related to sets of genes that show consistent expression patterns across subsets of experimental conditions. The uncovered structures showed a clear biological meaning in terms of relationships among functional annotations of genes and the phenotypes or physiological states of the associated conditions. CONCLUSION: The proposed approach can be a useful tool to analyze large and heterogeneous gene expression datasets. The method is able to identify complex relationships among genes and conditions that are difficult to identify by standard clustering algorithms.
format	Text
id	pubmed-1434777
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-14347772006-04-21 Biclustering of gene expression data by non-smooth non-negative matrix factorization Carmona-Saez, Pedro Pascual-Marqui, Roberto D Tirado, F Carazo, Jose M Pascual-Montano, Alberto BMC Bioinformatics Methodology Article BACKGROUND: The extended use of microarray technologies has enabled the generation and accumulation of gene expression datasets that contain expression levels of thousands of genes across tens or hundreds of different experimental conditions. One of the major challenges in the analysis of such datasets is to discover local structures composed by sets of genes that show coherent expression patterns across subsets of experimental conditions. These patterns may provide clues about the main biological processes associated to different physiological states. RESULTS: In this work we present a methodology able to cluster genes and conditions highly related in sub-portions of the data. Our approach is based on a new data mining technique, Non-smooth Non-Negative Matrix Factorization (nsNMF), able to identify localized patterns in large datasets. We assessed the potential of this methodology analyzing several synthetic datasets as well as two large and heterogeneous sets of gene expression profiles. In all cases the method was able to identify localized features related to sets of genes that show consistent expression patterns across subsets of experimental conditions. The uncovered structures showed a clear biological meaning in terms of relationships among functional annotations of genes and the phenotypes or physiological states of the associated conditions. CONCLUSION: The proposed approach can be a useful tool to analyze large and heterogeneous gene expression datasets. The method is able to identify complex relationships among genes and conditions that are difficult to identify by standard clustering algorithms. BioMed Central 2006-02-17 /pmc/articles/PMC1434777/ /pubmed/16503973 http://dx.doi.org/10.1186/1471-2105-7-78 Text en Copyright © 2006 Carmona-Saez et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Carmona-Saez, Pedro Pascual-Marqui, Roberto D Tirado, F Carazo, Jose M Pascual-Montano, Alberto Biclustering of gene expression data by non-smooth non-negative matrix factorization
title	Biclustering of gene expression data by non-smooth non-negative matrix factorization
title_full	Biclustering of gene expression data by non-smooth non-negative matrix factorization
title_fullStr	Biclustering of gene expression data by non-smooth non-negative matrix factorization
title_full_unstemmed	Biclustering of gene expression data by non-smooth non-negative matrix factorization
title_short	Biclustering of gene expression data by non-smooth non-negative matrix factorization
title_sort	biclustering of gene expression data by non-smooth non-negative matrix factorization
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1434777/ https://www.ncbi.nlm.nih.gov/pubmed/16503973 http://dx.doi.org/10.1186/1471-2105-7-78
work_keys_str_mv	AT carmonasaezpedro biclusteringofgeneexpressiondatabynonsmoothnonnegativematrixfactorization AT pascualmarquirobertod biclusteringofgeneexpressiondatabynonsmoothnonnegativematrixfactorization AT tiradof biclusteringofgeneexpressiondatabynonsmoothnonnegativematrixfactorization AT carazojosem biclusteringofgeneexpressiondatabynonsmoothnonnegativematrixfactorization AT pascualmontanoalberto biclusteringofgeneexpressiondatabynonsmoothnonnegativematrixfactorization

Biclustering of gene expression data by non-smooth non-negative matrix factorization

Ejemplares similares