Cargando…

A multiple kernel density clustering algorithm for incomplete datasets in bioinformatics

BACKGROUND: While there are a large number of bioinformatics datasets for clustering, many of them are incomplete, i.e., missing attribute values in some data samples needed by clustering algorithms. A variety of clustering algorithms have been proposed in the past years, but they usually are limite...

Descripción completa

Detalles Bibliográficos
Autores principales:	Liao, Longlong, Li, Kenli, Li, Keqin, Yang, Canqun, Tian, Qi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2018
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6249732/ https://www.ncbi.nlm.nih.gov/pubmed/30463619 http://dx.doi.org/10.1186/s12918-018-0630-6

_version_	1783372804062707712
author	Liao, Longlong Li, Kenli Li, Keqin Yang, Canqun Tian, Qi
author_facet	Liao, Longlong Li, Kenli Li, Keqin Yang, Canqun Tian, Qi
author_sort	Liao, Longlong
collection	PubMed
description	BACKGROUND: While there are a large number of bioinformatics datasets for clustering, many of them are incomplete, i.e., missing attribute values in some data samples needed by clustering algorithms. A variety of clustering algorithms have been proposed in the past years, but they usually are limited to cluster on the complete dataset. Besides, conventional clustering algorithms cannot obtain a trade-off between accuracy and efficiency of the clustering process since many essential parameters are determined by the human user’s experience. RESULTS: The paper proposes a Multiple Kernel Density Clustering algorithm for Incomplete datasets called MKDCI. The MKDCI algorithm consists of recovering missing attribute values of input data samples, learning an optimally combined kernel for clustering the input dataset, reducing dimensionality with the optimal kernel based on multiple basis kernels, detecting cluster centroids with the Isolation Forests method, assigning clusters with arbitrary shape and visualizing the results. CONCLUSIONS: Extensive experiments on several well-known clustering datasets in bioinformatics field demonstrate the effectiveness of the proposed MKDCI algorithm. Compared with existing density clustering algorithms and parameter-free clustering algorithms, the proposed MKDCI algorithm tends to automatically produce clusters of better quality on the incomplete dataset in bioinformatics.
format	Online Article Text
id	pubmed-6249732
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-62497322018-11-26 A multiple kernel density clustering algorithm for incomplete datasets in bioinformatics Liao, Longlong Li, Kenli Li, Keqin Yang, Canqun Tian, Qi BMC Syst Biol Research BACKGROUND: While there are a large number of bioinformatics datasets for clustering, many of them are incomplete, i.e., missing attribute values in some data samples needed by clustering algorithms. A variety of clustering algorithms have been proposed in the past years, but they usually are limited to cluster on the complete dataset. Besides, conventional clustering algorithms cannot obtain a trade-off between accuracy and efficiency of the clustering process since many essential parameters are determined by the human user’s experience. RESULTS: The paper proposes a Multiple Kernel Density Clustering algorithm for Incomplete datasets called MKDCI. The MKDCI algorithm consists of recovering missing attribute values of input data samples, learning an optimally combined kernel for clustering the input dataset, reducing dimensionality with the optimal kernel based on multiple basis kernels, detecting cluster centroids with the Isolation Forests method, assigning clusters with arbitrary shape and visualizing the results. CONCLUSIONS: Extensive experiments on several well-known clustering datasets in bioinformatics field demonstrate the effectiveness of the proposed MKDCI algorithm. Compared with existing density clustering algorithms and parameter-free clustering algorithms, the proposed MKDCI algorithm tends to automatically produce clusters of better quality on the incomplete dataset in bioinformatics. BioMed Central 2018-11-22 /pmc/articles/PMC6249732/ /pubmed/30463619 http://dx.doi.org/10.1186/s12918-018-0630-6 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Liao, Longlong Li, Kenli Li, Keqin Yang, Canqun Tian, Qi A multiple kernel density clustering algorithm for incomplete datasets in bioinformatics
title	A multiple kernel density clustering algorithm for incomplete datasets in bioinformatics
title_full	A multiple kernel density clustering algorithm for incomplete datasets in bioinformatics
title_fullStr	A multiple kernel density clustering algorithm for incomplete datasets in bioinformatics
title_full_unstemmed	A multiple kernel density clustering algorithm for incomplete datasets in bioinformatics
title_short	A multiple kernel density clustering algorithm for incomplete datasets in bioinformatics
title_sort	multiple kernel density clustering algorithm for incomplete datasets in bioinformatics
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6249732/ https://www.ncbi.nlm.nih.gov/pubmed/30463619 http://dx.doi.org/10.1186/s12918-018-0630-6
work_keys_str_mv	AT liaolonglong amultiplekerneldensityclusteringalgorithmforincompletedatasetsinbioinformatics AT likenli amultiplekerneldensityclusteringalgorithmforincompletedatasetsinbioinformatics AT likeqin amultiplekerneldensityclusteringalgorithmforincompletedatasetsinbioinformatics AT yangcanqun amultiplekerneldensityclusteringalgorithmforincompletedatasetsinbioinformatics AT tianqi amultiplekerneldensityclusteringalgorithmforincompletedatasetsinbioinformatics AT liaolonglong multiplekerneldensityclusteringalgorithmforincompletedatasetsinbioinformatics AT likenli multiplekerneldensityclusteringalgorithmforincompletedatasetsinbioinformatics AT likeqin multiplekerneldensityclusteringalgorithmforincompletedatasetsinbioinformatics AT yangcanqun multiplekerneldensityclusteringalgorithmforincompletedatasetsinbioinformatics AT tianqi multiplekerneldensityclusteringalgorithmforincompletedatasetsinbioinformatics

A multiple kernel density clustering algorithm for incomplete datasets in bioinformatics

Ejemplares similares