Cargando…

Clustering gene expression data with a penalized graph-based metric

BACKGROUND: The search for cluster structure in microarray datasets is a base problem for the so-called "-omic sciences". A difficult problem in clustering is how to handle data with a manifold structure, i.e. data that is not shaped in the form of compact clouds of points, forming arbitra...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bayá, Ariel E, Granitto, Pablo M
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3023695/ https://www.ncbi.nlm.nih.gov/pubmed/21205299 http://dx.doi.org/10.1186/1471-2105-12-2

_version_	1782196681067986944
author	Bayá, Ariel E Granitto, Pablo M
author_facet	Bayá, Ariel E Granitto, Pablo M
author_sort	Bayá, Ariel E
collection	PubMed
description	BACKGROUND: The search for cluster structure in microarray datasets is a base problem for the so-called "-omic sciences". A difficult problem in clustering is how to handle data with a manifold structure, i.e. data that is not shaped in the form of compact clouds of points, forming arbitrary shapes or paths embedded in a high-dimensional space, as could be the case of some gene expression datasets. RESULTS: In this work we introduce the Penalized k-Nearest-Neighbor-Graph (PKNNG) based metric, a new tool for evaluating distances in such cases. The new metric can be used in combination with most clustering algorithms. The PKNNG metric is based on a two-step procedure: first it constructs the k-Nearest-Neighbor-Graph of the dataset of interest using a low k-value and then it adds edges with a highly penalized weight for connecting the subgraphs produced by the first step. We discuss several possible schemes for connecting the different sub-graphs as well as penalization functions. We show clustering results on several public gene expression datasets and simulated artificial problems to evaluate the behavior of the new metric. CONCLUSIONS: In all cases the PKNNG metric shows promising clustering results. The use of the PKNNG metric can improve the performance of commonly used pairwise-distance based clustering methods, to the level of more advanced algorithms. A great advantage of the new procedure is that researchers do not need to learn a new method, they can simply compute distances with the PKNNG metric and then, for example, use hierarchical clustering to produce an accurate and highly interpretable dendrogram of their high-dimensional data.
format	Text
id	pubmed-3023695
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-30236952011-01-20 Clustering gene expression data with a penalized graph-based metric Bayá, Ariel E Granitto, Pablo M BMC Bioinformatics Methodology Article BACKGROUND: The search for cluster structure in microarray datasets is a base problem for the so-called "-omic sciences". A difficult problem in clustering is how to handle data with a manifold structure, i.e. data that is not shaped in the form of compact clouds of points, forming arbitrary shapes or paths embedded in a high-dimensional space, as could be the case of some gene expression datasets. RESULTS: In this work we introduce the Penalized k-Nearest-Neighbor-Graph (PKNNG) based metric, a new tool for evaluating distances in such cases. The new metric can be used in combination with most clustering algorithms. The PKNNG metric is based on a two-step procedure: first it constructs the k-Nearest-Neighbor-Graph of the dataset of interest using a low k-value and then it adds edges with a highly penalized weight for connecting the subgraphs produced by the first step. We discuss several possible schemes for connecting the different sub-graphs as well as penalization functions. We show clustering results on several public gene expression datasets and simulated artificial problems to evaluate the behavior of the new metric. CONCLUSIONS: In all cases the PKNNG metric shows promising clustering results. The use of the PKNNG metric can improve the performance of commonly used pairwise-distance based clustering methods, to the level of more advanced algorithms. A great advantage of the new procedure is that researchers do not need to learn a new method, they can simply compute distances with the PKNNG metric and then, for example, use hierarchical clustering to produce an accurate and highly interpretable dendrogram of their high-dimensional data. BioMed Central 2011-01-04 /pmc/articles/PMC3023695/ /pubmed/21205299 http://dx.doi.org/10.1186/1471-2105-12-2 Text en Copyright ©2011 Bayá and Granitto; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Bayá, Ariel E Granitto, Pablo M Clustering gene expression data with a penalized graph-based metric
title	Clustering gene expression data with a penalized graph-based metric
title_full	Clustering gene expression data with a penalized graph-based metric
title_fullStr	Clustering gene expression data with a penalized graph-based metric
title_full_unstemmed	Clustering gene expression data with a penalized graph-based metric
title_short	Clustering gene expression data with a penalized graph-based metric
title_sort	clustering gene expression data with a penalized graph-based metric
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3023695/ https://www.ncbi.nlm.nih.gov/pubmed/21205299 http://dx.doi.org/10.1186/1471-2105-12-2
work_keys_str_mv	AT bayaariele clusteringgeneexpressiondatawithapenalizedgraphbasedmetric AT granittopablom clusteringgeneexpressiondatawithapenalizedgraphbasedmetric

Clustering gene expression data with a penalized graph-based metric

Ejemplares similares