Cargando…

A continuous binning for discrete, sparse and concentrated observations

Discrete observations from data which are obtained from sparse, and yet concentrated events are often observed (e.g. road accidents or murders). Traditional methods to compute summary statistics often include placing the data in discrete bins but for this type of data this approach often results in...

Descripción completa

Detalles Bibliográficos
Autores principales: Prieto Curiel, Rafael, Cabrera Arnau, Carmen, Torres Pinedo, Mara, González Ramírez, Humberto, Bishop, Steven Richard
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6994295/
https://www.ncbi.nlm.nih.gov/pubmed/32021812
http://dx.doi.org/10.1016/j.mex.2019.10.020
_version_ 1783493174864379904
author Prieto Curiel, Rafael
Cabrera Arnau, Carmen
Torres Pinedo, Mara
González Ramírez, Humberto
Bishop, Steven Richard
author_facet Prieto Curiel, Rafael
Cabrera Arnau, Carmen
Torres Pinedo, Mara
González Ramírez, Humberto
Bishop, Steven Richard
author_sort Prieto Curiel, Rafael
collection PubMed
description Discrete observations from data which are obtained from sparse, and yet concentrated events are often observed (e.g. road accidents or murders). Traditional methods to compute summary statistics often include placing the data in discrete bins but for this type of data this approach often results in large numbers of empty bins for which no function or summary statistic can be computed. Here, a method for dealing with sparse and concentrated observations is constructed, based on a sequence of non-overlapping bins of varying size, which gives a continuous interpolation of data for computing summary statistics of the values for the data, such as the mean. The method presented here overcomes the problem which sparsity and concentration present when computing functions to represent the data. Implementation of the method presented here is facilitated via open access to the code. • A new method for computing functions over sparse and concentrated data is constructed. • The method allows straightforward functions to be computed over partitions of the data, such as the mean, but also more complicated functions, such as coefficients, ratios, correlations, regressions and others.
format Online
Article
Text
id pubmed-6994295
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-69942952020-02-04 A continuous binning for discrete, sparse and concentrated observations Prieto Curiel, Rafael Cabrera Arnau, Carmen Torres Pinedo, Mara González Ramírez, Humberto Bishop, Steven Richard MethodsX Mathematics Discrete observations from data which are obtained from sparse, and yet concentrated events are often observed (e.g. road accidents or murders). Traditional methods to compute summary statistics often include placing the data in discrete bins but for this type of data this approach often results in large numbers of empty bins for which no function or summary statistic can be computed. Here, a method for dealing with sparse and concentrated observations is constructed, based on a sequence of non-overlapping bins of varying size, which gives a continuous interpolation of data for computing summary statistics of the values for the data, such as the mean. The method presented here overcomes the problem which sparsity and concentration present when computing functions to represent the data. Implementation of the method presented here is facilitated via open access to the code. • A new method for computing functions over sparse and concentrated data is constructed. • The method allows straightforward functions to be computed over partitions of the data, such as the mean, but also more complicated functions, such as coefficients, ratios, correlations, regressions and others. Elsevier 2019-10-23 /pmc/articles/PMC6994295/ /pubmed/32021812 http://dx.doi.org/10.1016/j.mex.2019.10.020 Text en © 2020 The Authors. Published by Elsevier B.V. http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Mathematics
Prieto Curiel, Rafael
Cabrera Arnau, Carmen
Torres Pinedo, Mara
González Ramírez, Humberto
Bishop, Steven Richard
A continuous binning for discrete, sparse and concentrated observations
title A continuous binning for discrete, sparse and concentrated observations
title_full A continuous binning for discrete, sparse and concentrated observations
title_fullStr A continuous binning for discrete, sparse and concentrated observations
title_full_unstemmed A continuous binning for discrete, sparse and concentrated observations
title_short A continuous binning for discrete, sparse and concentrated observations
title_sort continuous binning for discrete, sparse and concentrated observations
topic Mathematics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6994295/
https://www.ncbi.nlm.nih.gov/pubmed/32021812
http://dx.doi.org/10.1016/j.mex.2019.10.020
work_keys_str_mv AT prietocurielrafael acontinuousbinningfordiscretesparseandconcentratedobservations
AT cabreraarnaucarmen acontinuousbinningfordiscretesparseandconcentratedobservations
AT torrespinedomara acontinuousbinningfordiscretesparseandconcentratedobservations
AT gonzalezramirezhumberto acontinuousbinningfordiscretesparseandconcentratedobservations
AT bishopstevenrichard acontinuousbinningfordiscretesparseandconcentratedobservations
AT prietocurielrafael continuousbinningfordiscretesparseandconcentratedobservations
AT cabreraarnaucarmen continuousbinningfordiscretesparseandconcentratedobservations
AT torrespinedomara continuousbinningfordiscretesparseandconcentratedobservations
AT gonzalezramirezhumberto continuousbinningfordiscretesparseandconcentratedobservations
AT bishopstevenrichard continuousbinningfordiscretesparseandconcentratedobservations