Cargando…

densityCut: an efficient and versatile topological approach for automatic clustering of biological data

Motivation: Many biological data processing problems can be formalized as clustering problems to partition data points into sensible and biologically interpretable groups. Results: This article introduces densityCut, a novel density-based clustering algorithm, which is both time- and space-efficient...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ding, Jiarui, Shah, Sohrab, Condon, Anne
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2016
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5013902/ https://www.ncbi.nlm.nih.gov/pubmed/27153661 http://dx.doi.org/10.1093/bioinformatics/btw227

_version_	1782452236148801536
author	Ding, Jiarui Shah, Sohrab Condon, Anne
author_facet	Ding, Jiarui Shah, Sohrab Condon, Anne
author_sort	Ding, Jiarui
collection	PubMed
description	Motivation: Many biological data processing problems can be formalized as clustering problems to partition data points into sensible and biologically interpretable groups. Results: This article introduces densityCut, a novel density-based clustering algorithm, which is both time- and space-efficient and proceeds as follows: densityCut first roughly estimates the densities of data points from a K-nearest neighbour graph and then refines the densities via a random walk. A cluster consists of points falling into the basin of attraction of an estimated mode of the underlining density function. A post-processing step merges clusters and generates a hierarchical cluster tree. The number of clusters is selected from the most stable clustering in the hierarchical cluster tree. Experimental results on ten synthetic benchmark datasets and two microarray gene expression datasets demonstrate that densityCut performs better than state-of-the-art algorithms for clustering biological datasets. For applications, we focus on the recent cancer mutation clustering and single cell data analyses, namely to cluster variant allele frequencies of somatic mutations to reveal clonal architectures of individual tumours, to cluster single-cell gene expression data to uncover cell population compositions, and to cluster single-cell mass cytometry data to detect communities of cells of the same functional states or types. densityCut performs better than competing algorithms and is scalable to large datasets. Availability and Implementation: Data and the densityCut R package is available from https://bitbucket.org/jerry00/densitycut_dev. Contact: condon@cs.ubc.ca or sshah@bccrc.ca or jiaruid@cs.ubc.ca Supplementary information: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-5013902
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-50139022016-09-12 densityCut: an efficient and versatile topological approach for automatic clustering of biological data Ding, Jiarui Shah, Sohrab Condon, Anne Bioinformatics Original Papers Motivation: Many biological data processing problems can be formalized as clustering problems to partition data points into sensible and biologically interpretable groups. Results: This article introduces densityCut, a novel density-based clustering algorithm, which is both time- and space-efficient and proceeds as follows: densityCut first roughly estimates the densities of data points from a K-nearest neighbour graph and then refines the densities via a random walk. A cluster consists of points falling into the basin of attraction of an estimated mode of the underlining density function. A post-processing step merges clusters and generates a hierarchical cluster tree. The number of clusters is selected from the most stable clustering in the hierarchical cluster tree. Experimental results on ten synthetic benchmark datasets and two microarray gene expression datasets demonstrate that densityCut performs better than state-of-the-art algorithms for clustering biological datasets. For applications, we focus on the recent cancer mutation clustering and single cell data analyses, namely to cluster variant allele frequencies of somatic mutations to reveal clonal architectures of individual tumours, to cluster single-cell gene expression data to uncover cell population compositions, and to cluster single-cell mass cytometry data to detect communities of cells of the same functional states or types. densityCut performs better than competing algorithms and is scalable to large datasets. Availability and Implementation: Data and the densityCut R package is available from https://bitbucket.org/jerry00/densitycut_dev. Contact: condon@cs.ubc.ca or sshah@bccrc.ca or jiaruid@cs.ubc.ca Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-09-01 2016-04-23 /pmc/articles/PMC5013902/ /pubmed/27153661 http://dx.doi.org/10.1093/bioinformatics/btw227 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Papers Ding, Jiarui Shah, Sohrab Condon, Anne densityCut: an efficient and versatile topological approach for automatic clustering of biological data
title	densityCut: an efficient and versatile topological approach for automatic clustering of biological data
title_full	densityCut: an efficient and versatile topological approach for automatic clustering of biological data
title_fullStr	densityCut: an efficient and versatile topological approach for automatic clustering of biological data
title_full_unstemmed	densityCut: an efficient and versatile topological approach for automatic clustering of biological data
title_short	densityCut: an efficient and versatile topological approach for automatic clustering of biological data
title_sort	densitycut: an efficient and versatile topological approach for automatic clustering of biological data
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5013902/ https://www.ncbi.nlm.nih.gov/pubmed/27153661 http://dx.doi.org/10.1093/bioinformatics/btw227
work_keys_str_mv	AT dingjiarui densitycutanefficientandversatiletopologicalapproachforautomaticclusteringofbiologicaldata AT shahsohrab densitycutanefficientandversatiletopologicalapproachforautomaticclusteringofbiologicaldata AT condonanne densitycutanefficientandversatiletopologicalapproachforautomaticclusteringofbiologicaldata

densityCut: an efficient and versatile topological approach for automatic clustering of biological data

Ejemplares similares