Cargando…

optCluster: An R Package for Determining the Optimal Clustering Algorithm

There exist numerous programs and packages that perform validation for a given clustering solution; however, clustering algorithms fare differently as judged by different validation measures. If more than one performance measure is used to evaluate multiple clustering partitions, an optimal result i...

Descripción completa

Detalles Bibliográficos
Autores principales: Sekula, Michael, Datta, Somnath, Datta, Susmita
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Biomedical Informatics 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5450252/
https://www.ncbi.nlm.nih.gov/pubmed/28584451
http://dx.doi.org/10.6026/97320630013101
_version_ 1783239933270425600
author Sekula, Michael
Datta, Somnath
Datta, Susmita
author_facet Sekula, Michael
Datta, Somnath
Datta, Susmita
author_sort Sekula, Michael
collection PubMed
description There exist numerous programs and packages that perform validation for a given clustering solution; however, clustering algorithms fare differently as judged by different validation measures. If more than one performance measure is used to evaluate multiple clustering partitions, an optimal result is often difficult to determine by visual inspection alone. This paper introduces optCluster, an R package that uses a single function to simultaneously compare numerous clustering partitions (created by different algorithms and/or numbers of clusters) and obtain a “best” option for a given dataset. The method of weighted rank aggregation is utilized by this package to objectively aggregate various performance measure scores, thereby taking away the guesswork that often follows a visual inspection of cluster results. The optCluster package contains biological validation measures as well as clustering algorithms developed specifically for RNA sequencing data, making it a useful tool for clustering genomic data. AVAILABILITY: This package is available for free through the Comprehensive R Archive Network (CRAN) at http://cran.rproject.org/web/packages/optCluster/
format Online
Article
Text
id pubmed-5450252
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Biomedical Informatics
record_format MEDLINE/PubMed
spelling pubmed-54502522017-06-05 optCluster: An R Package for Determining the Optimal Clustering Algorithm Sekula, Michael Datta, Somnath Datta, Susmita Bioinformation Software There exist numerous programs and packages that perform validation for a given clustering solution; however, clustering algorithms fare differently as judged by different validation measures. If more than one performance measure is used to evaluate multiple clustering partitions, an optimal result is often difficult to determine by visual inspection alone. This paper introduces optCluster, an R package that uses a single function to simultaneously compare numerous clustering partitions (created by different algorithms and/or numbers of clusters) and obtain a “best” option for a given dataset. The method of weighted rank aggregation is utilized by this package to objectively aggregate various performance measure scores, thereby taking away the guesswork that often follows a visual inspection of cluster results. The optCluster package contains biological validation measures as well as clustering algorithms developed specifically for RNA sequencing data, making it a useful tool for clustering genomic data. AVAILABILITY: This package is available for free through the Comprehensive R Archive Network (CRAN) at http://cran.rproject.org/web/packages/optCluster/ Biomedical Informatics 2017-03-31 /pmc/articles/PMC5450252/ /pubmed/28584451 http://dx.doi.org/10.6026/97320630013101 Text en © 2017 Biomedical Informatics http://creativecommons.org/licenses/by/3.0/ This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License.
spellingShingle Software
Sekula, Michael
Datta, Somnath
Datta, Susmita
optCluster: An R Package for Determining the Optimal Clustering Algorithm
title optCluster: An R Package for Determining the Optimal Clustering Algorithm
title_full optCluster: An R Package for Determining the Optimal Clustering Algorithm
title_fullStr optCluster: An R Package for Determining the Optimal Clustering Algorithm
title_full_unstemmed optCluster: An R Package for Determining the Optimal Clustering Algorithm
title_short optCluster: An R Package for Determining the Optimal Clustering Algorithm
title_sort optcluster: an r package for determining the optimal clustering algorithm
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5450252/
https://www.ncbi.nlm.nih.gov/pubmed/28584451
http://dx.doi.org/10.6026/97320630013101
work_keys_str_mv AT sekulamichael optclusteranrpackagefordeterminingtheoptimalclusteringalgorithm
AT dattasomnath optclusteranrpackagefordeterminingtheoptimalclusteringalgorithm
AT dattasusmita optclusteranrpackagefordeterminingtheoptimalclusteringalgorithm