Cargando…
A continuous binning for discrete, sparse and concentrated observations
Discrete observations from data which are obtained from sparse, and yet concentrated events are often observed (e.g. road accidents or murders). Traditional methods to compute summary statistics often include placing the data in discrete bins but for this type of data this approach often results in...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6994295/ https://www.ncbi.nlm.nih.gov/pubmed/32021812 http://dx.doi.org/10.1016/j.mex.2019.10.020 |
_version_ | 1783493174864379904 |
---|---|
author | Prieto Curiel, Rafael Cabrera Arnau, Carmen Torres Pinedo, Mara González Ramírez, Humberto Bishop, Steven Richard |
author_facet | Prieto Curiel, Rafael Cabrera Arnau, Carmen Torres Pinedo, Mara González Ramírez, Humberto Bishop, Steven Richard |
author_sort | Prieto Curiel, Rafael |
collection | PubMed |
description | Discrete observations from data which are obtained from sparse, and yet concentrated events are often observed (e.g. road accidents or murders). Traditional methods to compute summary statistics often include placing the data in discrete bins but for this type of data this approach often results in large numbers of empty bins for which no function or summary statistic can be computed. Here, a method for dealing with sparse and concentrated observations is constructed, based on a sequence of non-overlapping bins of varying size, which gives a continuous interpolation of data for computing summary statistics of the values for the data, such as the mean. The method presented here overcomes the problem which sparsity and concentration present when computing functions to represent the data. Implementation of the method presented here is facilitated via open access to the code. • A new method for computing functions over sparse and concentrated data is constructed. • The method allows straightforward functions to be computed over partitions of the data, such as the mean, but also more complicated functions, such as coefficients, ratios, correlations, regressions and others. |
format | Online Article Text |
id | pubmed-6994295 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-69942952020-02-04 A continuous binning for discrete, sparse and concentrated observations Prieto Curiel, Rafael Cabrera Arnau, Carmen Torres Pinedo, Mara González Ramírez, Humberto Bishop, Steven Richard MethodsX Mathematics Discrete observations from data which are obtained from sparse, and yet concentrated events are often observed (e.g. road accidents or murders). Traditional methods to compute summary statistics often include placing the data in discrete bins but for this type of data this approach often results in large numbers of empty bins for which no function or summary statistic can be computed. Here, a method for dealing with sparse and concentrated observations is constructed, based on a sequence of non-overlapping bins of varying size, which gives a continuous interpolation of data for computing summary statistics of the values for the data, such as the mean. The method presented here overcomes the problem which sparsity and concentration present when computing functions to represent the data. Implementation of the method presented here is facilitated via open access to the code. • A new method for computing functions over sparse and concentrated data is constructed. • The method allows straightforward functions to be computed over partitions of the data, such as the mean, but also more complicated functions, such as coefficients, ratios, correlations, regressions and others. Elsevier 2019-10-23 /pmc/articles/PMC6994295/ /pubmed/32021812 http://dx.doi.org/10.1016/j.mex.2019.10.020 Text en © 2020 The Authors. Published by Elsevier B.V. http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Mathematics Prieto Curiel, Rafael Cabrera Arnau, Carmen Torres Pinedo, Mara González Ramírez, Humberto Bishop, Steven Richard A continuous binning for discrete, sparse and concentrated observations |
title | A continuous binning for discrete, sparse and concentrated observations |
title_full | A continuous binning for discrete, sparse and concentrated observations |
title_fullStr | A continuous binning for discrete, sparse and concentrated observations |
title_full_unstemmed | A continuous binning for discrete, sparse and concentrated observations |
title_short | A continuous binning for discrete, sparse and concentrated observations |
title_sort | continuous binning for discrete, sparse and concentrated observations |
topic | Mathematics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6994295/ https://www.ncbi.nlm.nih.gov/pubmed/32021812 http://dx.doi.org/10.1016/j.mex.2019.10.020 |
work_keys_str_mv | AT prietocurielrafael acontinuousbinningfordiscretesparseandconcentratedobservations AT cabreraarnaucarmen acontinuousbinningfordiscretesparseandconcentratedobservations AT torrespinedomara acontinuousbinningfordiscretesparseandconcentratedobservations AT gonzalezramirezhumberto acontinuousbinningfordiscretesparseandconcentratedobservations AT bishopstevenrichard acontinuousbinningfordiscretesparseandconcentratedobservations AT prietocurielrafael continuousbinningfordiscretesparseandconcentratedobservations AT cabreraarnaucarmen continuousbinningfordiscretesparseandconcentratedobservations AT torrespinedomara continuousbinningfordiscretesparseandconcentratedobservations AT gonzalezramirezhumberto continuousbinningfordiscretesparseandconcentratedobservations AT bishopstevenrichard continuousbinningfordiscretesparseandconcentratedobservations |