Cargando…

K-Profiles: A Nonlinear Clustering Method for Pattern Detection in High Dimensional Data

With modern technologies such as microarray, deep sequencing, and liquid chromatography-mass spectrometry (LC-MS), it is possible to measure the expression levels of thousands of genes/proteins simultaneously to unravel important biological processes. A very first step towards elucidating hidden pat...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Kai, Zhao, Qing, Lu, Jianwei, Yu, Tianwei
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi Publishing Corporation 2015
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4538770/ https://www.ncbi.nlm.nih.gov/pubmed/26339652 http://dx.doi.org/10.1155/2015/918954

_version_	1782386031029387264
author	Wang, Kai Zhao, Qing Lu, Jianwei Yu, Tianwei
author_facet	Wang, Kai Zhao, Qing Lu, Jianwei Yu, Tianwei
author_sort	Wang, Kai
collection	PubMed
description	With modern technologies such as microarray, deep sequencing, and liquid chromatography-mass spectrometry (LC-MS), it is possible to measure the expression levels of thousands of genes/proteins simultaneously to unravel important biological processes. A very first step towards elucidating hidden patterns and understanding the massive data is the application of clustering techniques. Nonlinear relations, which were mostly unutilized in contrast to linear correlations, are prevalent in high-throughput data. In many cases, nonlinear relations can model the biological relationship more precisely and reflect critical patterns in the biological systems. Using the general dependency measure, Distance Based on Conditional Ordered List (DCOL) that we introduced before, we designed the nonlinear K-profiles clustering method, which can be seen as the nonlinear counterpart of the K-means clustering algorithm. The method has a built-in statistical testing procedure that ensures genes not belonging to any cluster do not impact the estimation of cluster profiles. Results from extensive simulation studies showed that K-profiles clustering not only outperformed traditional linear K-means algorithm, but also presented significantly better performance over our previous General Dependency Hierarchical Clustering (GDHC) algorithm. We further analyzed a gene expression dataset, on which K-profile clustering generated biologically meaningful results.
format	Online Article Text
id	pubmed-4538770
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Hindawi Publishing Corporation
record_format	MEDLINE/PubMed
spelling	pubmed-45387702015-09-03 K-Profiles: A Nonlinear Clustering Method for Pattern Detection in High Dimensional Data Wang, Kai Zhao, Qing Lu, Jianwei Yu, Tianwei Biomed Res Int Research Article With modern technologies such as microarray, deep sequencing, and liquid chromatography-mass spectrometry (LC-MS), it is possible to measure the expression levels of thousands of genes/proteins simultaneously to unravel important biological processes. A very first step towards elucidating hidden patterns and understanding the massive data is the application of clustering techniques. Nonlinear relations, which were mostly unutilized in contrast to linear correlations, are prevalent in high-throughput data. In many cases, nonlinear relations can model the biological relationship more precisely and reflect critical patterns in the biological systems. Using the general dependency measure, Distance Based on Conditional Ordered List (DCOL) that we introduced before, we designed the nonlinear K-profiles clustering method, which can be seen as the nonlinear counterpart of the K-means clustering algorithm. The method has a built-in statistical testing procedure that ensures genes not belonging to any cluster do not impact the estimation of cluster profiles. Results from extensive simulation studies showed that K-profiles clustering not only outperformed traditional linear K-means algorithm, but also presented significantly better performance over our previous General Dependency Hierarchical Clustering (GDHC) algorithm. We further analyzed a gene expression dataset, on which K-profile clustering generated biologically meaningful results. Hindawi Publishing Corporation 2015 2015-08-03 /pmc/articles/PMC4538770/ /pubmed/26339652 http://dx.doi.org/10.1155/2015/918954 Text en Copyright © 2015 Kai Wang et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Wang, Kai Zhao, Qing Lu, Jianwei Yu, Tianwei K-Profiles: A Nonlinear Clustering Method for Pattern Detection in High Dimensional Data
title	K-Profiles: A Nonlinear Clustering Method for Pattern Detection in High Dimensional Data
title_full	K-Profiles: A Nonlinear Clustering Method for Pattern Detection in High Dimensional Data
title_fullStr	K-Profiles: A Nonlinear Clustering Method for Pattern Detection in High Dimensional Data
title_full_unstemmed	K-Profiles: A Nonlinear Clustering Method for Pattern Detection in High Dimensional Data
title_short	K-Profiles: A Nonlinear Clustering Method for Pattern Detection in High Dimensional Data
title_sort	k-profiles: a nonlinear clustering method for pattern detection in high dimensional data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4538770/ https://www.ncbi.nlm.nih.gov/pubmed/26339652 http://dx.doi.org/10.1155/2015/918954
work_keys_str_mv	AT wangkai kprofilesanonlinearclusteringmethodforpatterndetectioninhighdimensionaldata AT zhaoqing kprofilesanonlinearclusteringmethodforpatterndetectioninhighdimensionaldata AT lujianwei kprofilesanonlinearclusteringmethodforpatterndetectioninhighdimensionaldata AT yutianwei kprofilesanonlinearclusteringmethodforpatterndetectioninhighdimensionaldata

K-Profiles: A Nonlinear Clustering Method for Pattern Detection in High Dimensional Data

Ejemplares similares