Cargando…
Descriptive statistics and visualization of data from the R datasets package with implications for clusterability
The manuscript describes and visualizes datasets from the datasets package in the R statistical software, focusing on descriptive statistics and visualizations that provide insights into the clusterability of these datasets. These publicly available datasets are contained in the R software system, a...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612012/ https://www.ncbi.nlm.nih.gov/pubmed/31317060 http://dx.doi.org/10.1016/j.dib.2019.104004 |
_version_ | 1783432806879199232 |
---|---|
author | Brownstein, Naomi C. Adolfsson, Andreas Ackerman, Margareta |
author_facet | Brownstein, Naomi C. Adolfsson, Andreas Ackerman, Margareta |
author_sort | Brownstein, Naomi C. |
collection | PubMed |
description | The manuscript describes and visualizes datasets from the datasets package in the R statistical software, focusing on descriptive statistics and visualizations that provide insights into the clusterability of these datasets. These publicly available datasets are contained in the R software system, and can be downloaded at https://www.r-project.org/, with documentation provided at https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html. Further information on clusterability is found in the companion to this article, To Cluster or Not to Cluster: An Analysis of Clusterability Methods? (https://doi.org/10.1016/j.patcog.2018.10.026). Brief descriptions and graphs of the variables contained in each dataset are provided in the form of means, extrema, quartiles, standard deviation and standard error. Two-dimensional plots for each pair of variables are provided. Original references to the data sets are included when available. Further, each dataset is reduced to a single dimension by each of two different methods: pairwise distances and principal component analysis. For the latter, only the first component is used. Histograms of the reduced data are included for every dataset using both methods. |
format | Online Article Text |
id | pubmed-6612012 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-66120122019-07-17 Descriptive statistics and visualization of data from the R datasets package with implications for clusterability Brownstein, Naomi C. Adolfsson, Andreas Ackerman, Margareta Data Brief Mathematics The manuscript describes and visualizes datasets from the datasets package in the R statistical software, focusing on descriptive statistics and visualizations that provide insights into the clusterability of these datasets. These publicly available datasets are contained in the R software system, and can be downloaded at https://www.r-project.org/, with documentation provided at https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html. Further information on clusterability is found in the companion to this article, To Cluster or Not to Cluster: An Analysis of Clusterability Methods? (https://doi.org/10.1016/j.patcog.2018.10.026). Brief descriptions and graphs of the variables contained in each dataset are provided in the form of means, extrema, quartiles, standard deviation and standard error. Two-dimensional plots for each pair of variables are provided. Original references to the data sets are included when available. Further, each dataset is reduced to a single dimension by each of two different methods: pairwise distances and principal component analysis. For the latter, only the first component is used. Histograms of the reduced data are included for every dataset using both methods. Elsevier 2019-05-24 /pmc/articles/PMC6612012/ /pubmed/31317060 http://dx.doi.org/10.1016/j.dib.2019.104004 Text en © 2019 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Mathematics Brownstein, Naomi C. Adolfsson, Andreas Ackerman, Margareta Descriptive statistics and visualization of data from the R datasets package with implications for clusterability |
title | Descriptive statistics and visualization of data from the R datasets package with implications for clusterability |
title_full | Descriptive statistics and visualization of data from the R datasets package with implications for clusterability |
title_fullStr | Descriptive statistics and visualization of data from the R datasets package with implications for clusterability |
title_full_unstemmed | Descriptive statistics and visualization of data from the R datasets package with implications for clusterability |
title_short | Descriptive statistics and visualization of data from the R datasets package with implications for clusterability |
title_sort | descriptive statistics and visualization of data from the r datasets package with implications for clusterability |
topic | Mathematics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612012/ https://www.ncbi.nlm.nih.gov/pubmed/31317060 http://dx.doi.org/10.1016/j.dib.2019.104004 |
work_keys_str_mv | AT brownsteinnaomic descriptivestatisticsandvisualizationofdatafromtherdatasetspackagewithimplicationsforclusterability AT adolfssonandreas descriptivestatisticsandvisualizationofdatafromtherdatasetspackagewithimplicationsforclusterability AT ackermanmargareta descriptivestatisticsandvisualizationofdatafromtherdatasetspackagewithimplicationsforclusterability |