Cargando…

ParaKMeans: Implementation of a parallelized K-means algorithm suitable for general laboratory use

BACKGROUND: During the last decade, the use of microarrays to assess the transcriptome of many biological systems has generated an enormous amount of data. A common technique used to organize and analyze microarray data is to perform cluster analysis. While many clustering algorithms have been devel...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kraj, Piotr, Sharma, Ashok, Garge, Nikhil, Podolsky, Robert, McIndoe, Richard A
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2375128/ https://www.ncbi.nlm.nih.gov/pubmed/18416829 http://dx.doi.org/10.1186/1471-2105-9-200

_version_	1782154583952326656
author	Kraj, Piotr Sharma, Ashok Garge, Nikhil Podolsky, Robert McIndoe, Richard A
author_facet	Kraj, Piotr Sharma, Ashok Garge, Nikhil Podolsky, Robert McIndoe, Richard A
author_sort	Kraj, Piotr
collection	PubMed
description	BACKGROUND: During the last decade, the use of microarrays to assess the transcriptome of many biological systems has generated an enormous amount of data. A common technique used to organize and analyze microarray data is to perform cluster analysis. While many clustering algorithms have been developed, they all suffer a significant decrease in computational performance as the size of the dataset being analyzed becomes very large. For example, clustering 10000 genes from an experiment containing 200 microarrays can be quite time consuming and challenging on a desktop PC. One solution to the scalability problem of clustering algorithms is to distribute or parallelize the algorithm across multiple computers. RESULTS: The software described in this paper is a high performance multithreaded application that implements a parallelized version of the K-means Clustering algorithm. Most parallel processing applications are not accessible to the general public and require specialized software libraries (e.g. MPI) and specialized hardware configurations. The parallel nature of the application comes from the use of a web service to perform the distance calculations and cluster assignments. Here we show our parallel implementation provides significant performance gains over a wide range of datasets using as little as seven nodes. The software was written in C# and was designed in a modular fashion to provide both deployment flexibility as well as flexibility in the user interface. CONCLUSION: ParaKMeans was designed to provide the general scientific community with an easy and manageable client-server application that can be installed on a wide variety of Windows operating systems.
format	Text
id	pubmed-2375128
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-23751282008-05-12 ParaKMeans: Implementation of a parallelized K-means algorithm suitable for general laboratory use Kraj, Piotr Sharma, Ashok Garge, Nikhil Podolsky, Robert McIndoe, Richard A BMC Bioinformatics Software BACKGROUND: During the last decade, the use of microarrays to assess the transcriptome of many biological systems has generated an enormous amount of data. A common technique used to organize and analyze microarray data is to perform cluster analysis. While many clustering algorithms have been developed, they all suffer a significant decrease in computational performance as the size of the dataset being analyzed becomes very large. For example, clustering 10000 genes from an experiment containing 200 microarrays can be quite time consuming and challenging on a desktop PC. One solution to the scalability problem of clustering algorithms is to distribute or parallelize the algorithm across multiple computers. RESULTS: The software described in this paper is a high performance multithreaded application that implements a parallelized version of the K-means Clustering algorithm. Most parallel processing applications are not accessible to the general public and require specialized software libraries (e.g. MPI) and specialized hardware configurations. The parallel nature of the application comes from the use of a web service to perform the distance calculations and cluster assignments. Here we show our parallel implementation provides significant performance gains over a wide range of datasets using as little as seven nodes. The software was written in C# and was designed in a modular fashion to provide both deployment flexibility as well as flexibility in the user interface. CONCLUSION: ParaKMeans was designed to provide the general scientific community with an easy and manageable client-server application that can be installed on a wide variety of Windows operating systems. BioMed Central 2008-04-16 /pmc/articles/PMC2375128/ /pubmed/18416829 http://dx.doi.org/10.1186/1471-2105-9-200 Text en Copyright © 2008 Kraj et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Software Kraj, Piotr Sharma, Ashok Garge, Nikhil Podolsky, Robert McIndoe, Richard A ParaKMeans: Implementation of a parallelized K-means algorithm suitable for general laboratory use
title	ParaKMeans: Implementation of a parallelized K-means algorithm suitable for general laboratory use
title_full	ParaKMeans: Implementation of a parallelized K-means algorithm suitable for general laboratory use
title_fullStr	ParaKMeans: Implementation of a parallelized K-means algorithm suitable for general laboratory use
title_full_unstemmed	ParaKMeans: Implementation of a parallelized K-means algorithm suitable for general laboratory use
title_short	ParaKMeans: Implementation of a parallelized K-means algorithm suitable for general laboratory use
title_sort	parakmeans: implementation of a parallelized k-means algorithm suitable for general laboratory use
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2375128/ https://www.ncbi.nlm.nih.gov/pubmed/18416829 http://dx.doi.org/10.1186/1471-2105-9-200
work_keys_str_mv	AT krajpiotr parakmeansimplementationofaparallelizedkmeansalgorithmsuitableforgenerallaboratoryuse AT sharmaashok parakmeansimplementationofaparallelizedkmeansalgorithmsuitableforgenerallaboratoryuse AT gargenikhil parakmeansimplementationofaparallelizedkmeansalgorithmsuitableforgenerallaboratoryuse AT podolskyrobert parakmeansimplementationofaparallelizedkmeansalgorithmsuitableforgenerallaboratoryuse AT mcindoericharda parakmeansimplementationofaparallelizedkmeansalgorithmsuitableforgenerallaboratoryuse

ParaKMeans: Implementation of a parallelized K-means algorithm suitable for general laboratory use

Ejemplares similares