Cargando…

Descriptive statistics and visualization of data from the R datasets package with implications for clusterability

The manuscript describes and visualizes datasets from the datasets package in the R statistical software, focusing on descriptive statistics and visualizations that provide insights into the clusterability of these datasets. These publicly available datasets are contained in the R software system, a...

Descripción completa

Detalles Bibliográficos
Autores principales: Brownstein, Naomi C., Adolfsson, Andreas, Ackerman, Margareta
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612012/
https://www.ncbi.nlm.nih.gov/pubmed/31317060
http://dx.doi.org/10.1016/j.dib.2019.104004
_version_ 1783432806879199232
author Brownstein, Naomi C.
Adolfsson, Andreas
Ackerman, Margareta
author_facet Brownstein, Naomi C.
Adolfsson, Andreas
Ackerman, Margareta
author_sort Brownstein, Naomi C.
collection PubMed
description The manuscript describes and visualizes datasets from the datasets package in the R statistical software, focusing on descriptive statistics and visualizations that provide insights into the clusterability of these datasets. These publicly available datasets are contained in the R software system, and can be downloaded at https://www.r-project.org/, with documentation provided at https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html. Further information on clusterability is found in the companion to this article, To Cluster or Not to Cluster: An Analysis of Clusterability Methods? (https://doi.org/10.1016/j.patcog.2018.10.026). Brief descriptions and graphs of the variables contained in each dataset are provided in the form of means, extrema, quartiles, standard deviation and standard error. Two-dimensional plots for each pair of variables are provided. Original references to the data sets are included when available. Further, each dataset is reduced to a single dimension by each of two different methods: pairwise distances and principal component analysis. For the latter, only the first component is used. Histograms of the reduced data are included for every dataset using both methods.
format Online
Article
Text
id pubmed-6612012
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-66120122019-07-17 Descriptive statistics and visualization of data from the R datasets package with implications for clusterability Brownstein, Naomi C. Adolfsson, Andreas Ackerman, Margareta Data Brief Mathematics The manuscript describes and visualizes datasets from the datasets package in the R statistical software, focusing on descriptive statistics and visualizations that provide insights into the clusterability of these datasets. These publicly available datasets are contained in the R software system, and can be downloaded at https://www.r-project.org/, with documentation provided at https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html. Further information on clusterability is found in the companion to this article, To Cluster or Not to Cluster: An Analysis of Clusterability Methods? (https://doi.org/10.1016/j.patcog.2018.10.026). Brief descriptions and graphs of the variables contained in each dataset are provided in the form of means, extrema, quartiles, standard deviation and standard error. Two-dimensional plots for each pair of variables are provided. Original references to the data sets are included when available. Further, each dataset is reduced to a single dimension by each of two different methods: pairwise distances and principal component analysis. For the latter, only the first component is used. Histograms of the reduced data are included for every dataset using both methods. Elsevier 2019-05-24 /pmc/articles/PMC6612012/ /pubmed/31317060 http://dx.doi.org/10.1016/j.dib.2019.104004 Text en © 2019 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Mathematics
Brownstein, Naomi C.
Adolfsson, Andreas
Ackerman, Margareta
Descriptive statistics and visualization of data from the R datasets package with implications for clusterability
title Descriptive statistics and visualization of data from the R datasets package with implications for clusterability
title_full Descriptive statistics and visualization of data from the R datasets package with implications for clusterability
title_fullStr Descriptive statistics and visualization of data from the R datasets package with implications for clusterability
title_full_unstemmed Descriptive statistics and visualization of data from the R datasets package with implications for clusterability
title_short Descriptive statistics and visualization of data from the R datasets package with implications for clusterability
title_sort descriptive statistics and visualization of data from the r datasets package with implications for clusterability
topic Mathematics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612012/
https://www.ncbi.nlm.nih.gov/pubmed/31317060
http://dx.doi.org/10.1016/j.dib.2019.104004
work_keys_str_mv AT brownsteinnaomic descriptivestatisticsandvisualizationofdatafromtherdatasetspackagewithimplicationsforclusterability
AT adolfssonandreas descriptivestatisticsandvisualizationofdatafromtherdatasetspackagewithimplicationsforclusterability
AT ackermanmargareta descriptivestatisticsandvisualizationofdatafromtherdatasetspackagewithimplicationsforclusterability