Cargando…

ClusterEnG: an interactive educational web resource for clustering and visualizing high-dimensional data

BACKGROUND: Clustering is one of the most common techniques in data analysis and seeks to group together data points that are similar in some measure. Although there are many computer programs available for performing clustering, a single web resource that provides several state-of-the-art clusterin...

Descripción completa

Detalles Bibliográficos
Autores principales: Manjunath, Mohith, Zhang, Yi, Kim, Yeonsung, Yeo, Steve H., Sobh, Omar, Russell, Nathan, Followell, Christian, Bushell, Colleen, Ravaioli, Umberto, Song, Jun S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6429934/
https://www.ncbi.nlm.nih.gov/pubmed/30906871
http://dx.doi.org/10.7717/peerj-cs.155
_version_ 1783405696569573376
author Manjunath, Mohith
Zhang, Yi
Kim, Yeonsung
Yeo, Steve H.
Sobh, Omar
Russell, Nathan
Followell, Christian
Bushell, Colleen
Ravaioli, Umberto
Song, Jun S.
author_facet Manjunath, Mohith
Zhang, Yi
Kim, Yeonsung
Yeo, Steve H.
Sobh, Omar
Russell, Nathan
Followell, Christian
Bushell, Colleen
Ravaioli, Umberto
Song, Jun S.
author_sort Manjunath, Mohith
collection PubMed
description BACKGROUND: Clustering is one of the most common techniques in data analysis and seeks to group together data points that are similar in some measure. Although there are many computer programs available for performing clustering, a single web resource that provides several state-of-the-art clustering methods, interactive visualizations and evaluation of clustering results is lacking. METHODS: ClusterEnG (acronym for Clustering Engine for Genomics) provides a web interface for clustering data and interactive visualizations including 3D views, data selection and zoom features. Eighteen clustering validation measures are also presented to aid the user in selecting a suitable algorithm for their dataset. ClusterEnG also aims at educating the user about the similarities and differences between various clustering algorithms and provides tutorials that demonstrate potential pitfalls of each algorithm. CONCLUSIONS: The web resource will be particularly useful to scientists who are not conversant with computing but want to understand the structure of their data in an intuitive manner. The validation measures facilitate the process of choosing a suitable clustering algorithm among the available options. ClusterEnG is part of a bigger project called KnowEnG (Knowledge Engine for Genomics) and is available at http://education.knoweng.org/clustereng.
format Online
Article
Text
id pubmed-6429934
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-64299342019-03-22 ClusterEnG: an interactive educational web resource for clustering and visualizing high-dimensional data Manjunath, Mohith Zhang, Yi Kim, Yeonsung Yeo, Steve H. Sobh, Omar Russell, Nathan Followell, Christian Bushell, Colleen Ravaioli, Umberto Song, Jun S. PeerJ Comput Sci Bioinformatics BACKGROUND: Clustering is one of the most common techniques in data analysis and seeks to group together data points that are similar in some measure. Although there are many computer programs available for performing clustering, a single web resource that provides several state-of-the-art clustering methods, interactive visualizations and evaluation of clustering results is lacking. METHODS: ClusterEnG (acronym for Clustering Engine for Genomics) provides a web interface for clustering data and interactive visualizations including 3D views, data selection and zoom features. Eighteen clustering validation measures are also presented to aid the user in selecting a suitable algorithm for their dataset. ClusterEnG also aims at educating the user about the similarities and differences between various clustering algorithms and provides tutorials that demonstrate potential pitfalls of each algorithm. CONCLUSIONS: The web resource will be particularly useful to scientists who are not conversant with computing but want to understand the structure of their data in an intuitive manner. The validation measures facilitate the process of choosing a suitable clustering algorithm among the available options. ClusterEnG is part of a bigger project called KnowEnG (Knowledge Engine for Genomics) and is available at http://education.knoweng.org/clustereng. PeerJ Inc. 2018-05-21 /pmc/articles/PMC6429934/ /pubmed/30906871 http://dx.doi.org/10.7717/peerj-cs.155 Text en ©2018 Manjunath et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Manjunath, Mohith
Zhang, Yi
Kim, Yeonsung
Yeo, Steve H.
Sobh, Omar
Russell, Nathan
Followell, Christian
Bushell, Colleen
Ravaioli, Umberto
Song, Jun S.
ClusterEnG: an interactive educational web resource for clustering and visualizing high-dimensional data
title ClusterEnG: an interactive educational web resource for clustering and visualizing high-dimensional data
title_full ClusterEnG: an interactive educational web resource for clustering and visualizing high-dimensional data
title_fullStr ClusterEnG: an interactive educational web resource for clustering and visualizing high-dimensional data
title_full_unstemmed ClusterEnG: an interactive educational web resource for clustering and visualizing high-dimensional data
title_short ClusterEnG: an interactive educational web resource for clustering and visualizing high-dimensional data
title_sort clustereng: an interactive educational web resource for clustering and visualizing high-dimensional data
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6429934/
https://www.ncbi.nlm.nih.gov/pubmed/30906871
http://dx.doi.org/10.7717/peerj-cs.155
work_keys_str_mv AT manjunathmohith clusterenganinteractiveeducationalwebresourceforclusteringandvisualizinghighdimensionaldata
AT zhangyi clusterenganinteractiveeducationalwebresourceforclusteringandvisualizinghighdimensionaldata
AT kimyeonsung clusterenganinteractiveeducationalwebresourceforclusteringandvisualizinghighdimensionaldata
AT yeosteveh clusterenganinteractiveeducationalwebresourceforclusteringandvisualizinghighdimensionaldata
AT sobhomar clusterenganinteractiveeducationalwebresourceforclusteringandvisualizinghighdimensionaldata
AT russellnathan clusterenganinteractiveeducationalwebresourceforclusteringandvisualizinghighdimensionaldata
AT followellchristian clusterenganinteractiveeducationalwebresourceforclusteringandvisualizinghighdimensionaldata
AT bushellcolleen clusterenganinteractiveeducationalwebresourceforclusteringandvisualizinghighdimensionaldata
AT ravaioliumberto clusterenganinteractiveeducationalwebresourceforclusteringandvisualizinghighdimensionaldata
AT songjuns clusterenganinteractiveeducationalwebresourceforclusteringandvisualizinghighdimensionaldata