Cargando…

CLAG: an unsupervised non hierarchical clustering algorithm handling biological data

BACKGROUND: Searching for similarities in a set of biological data is intrinsically difficult due to possible data points that should not be clustered, or that should group within several clusters. Under these hypotheses, hierarchical agglomerative clustering is not appropriate. Moreover, if the dat...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dib, Linda, Carbone, Alessandra
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3519615/ https://www.ncbi.nlm.nih.gov/pubmed/23216858 http://dx.doi.org/10.1186/1471-2105-13-194

_version_	1782252700061138944
author	Dib, Linda Carbone, Alessandra
author_facet	Dib, Linda Carbone, Alessandra
author_sort	Dib, Linda
collection	PubMed
description	BACKGROUND: Searching for similarities in a set of biological data is intrinsically difficult due to possible data points that should not be clustered, or that should group within several clusters. Under these hypotheses, hierarchical agglomerative clustering is not appropriate. Moreover, if the dataset is not known enough, like often is the case, supervised classification is not appropriate either. RESULTS: CLAG (for CLusters AGgregation) is an unsupervised non hierarchical clustering algorithm designed to cluster a large variety of biological data and to provide a clustered matrix and numerical values indicating cluster strength. CLAG clusterizes correlation matrices for residues in protein families, gene-expression and miRNA data related to various cancer types, sets of species described by multidimensional vectors of characters, binary matrices. It does not ask to all data points to cluster and it converges yielding the same result at each run. Its simplicity and speed allows it to run on reasonably large datasets. CONCLUSIONS: CLAG can be used to investigate the cluster structure present in biological datasets and to identify its underlying graph. It showed to be more informative and accurate than several known clustering methods, as hierarchical agglomerative clustering, k-means, fuzzy c-means, model-based clustering, affinity propagation clustering, and not to suffer of the convergence problem proper to this latter.
format	Online Article Text
id	pubmed-3519615
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-35196152012-12-12 CLAG: an unsupervised non hierarchical clustering algorithm handling biological data Dib, Linda Carbone, Alessandra BMC Bioinformatics Research Article BACKGROUND: Searching for similarities in a set of biological data is intrinsically difficult due to possible data points that should not be clustered, or that should group within several clusters. Under these hypotheses, hierarchical agglomerative clustering is not appropriate. Moreover, if the dataset is not known enough, like often is the case, supervised classification is not appropriate either. RESULTS: CLAG (for CLusters AGgregation) is an unsupervised non hierarchical clustering algorithm designed to cluster a large variety of biological data and to provide a clustered matrix and numerical values indicating cluster strength. CLAG clusterizes correlation matrices for residues in protein families, gene-expression and miRNA data related to various cancer types, sets of species described by multidimensional vectors of characters, binary matrices. It does not ask to all data points to cluster and it converges yielding the same result at each run. Its simplicity and speed allows it to run on reasonably large datasets. CONCLUSIONS: CLAG can be used to investigate the cluster structure present in biological datasets and to identify its underlying graph. It showed to be more informative and accurate than several known clustering methods, as hierarchical agglomerative clustering, k-means, fuzzy c-means, model-based clustering, affinity propagation clustering, and not to suffer of the convergence problem proper to this latter. BioMed Central 2012-08-08 /pmc/articles/PMC3519615/ /pubmed/23216858 http://dx.doi.org/10.1186/1471-2105-13-194 Text en Copyright ©2012 Dib and Carbone; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Dib, Linda Carbone, Alessandra CLAG: an unsupervised non hierarchical clustering algorithm handling biological data
title	CLAG: an unsupervised non hierarchical clustering algorithm handling biological data
title_full	CLAG: an unsupervised non hierarchical clustering algorithm handling biological data
title_fullStr	CLAG: an unsupervised non hierarchical clustering algorithm handling biological data
title_full_unstemmed	CLAG: an unsupervised non hierarchical clustering algorithm handling biological data
title_short	CLAG: an unsupervised non hierarchical clustering algorithm handling biological data
title_sort	clag: an unsupervised non hierarchical clustering algorithm handling biological data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3519615/ https://www.ncbi.nlm.nih.gov/pubmed/23216858 http://dx.doi.org/10.1186/1471-2105-13-194
work_keys_str_mv	AT diblinda claganunsupervisednonhierarchicalclusteringalgorithmhandlingbiologicaldata AT carbonealessandra claganunsupervisednonhierarchicalclusteringalgorithmhandlingbiologicaldata

CLAG: an unsupervised non hierarchical clustering algorithm handling biological data

Ejemplares similares