Cargando…

Comparison of non-parametric methods for ungrouping coarsely aggregated data

BACKGROUND: Histograms are a common tool to estimate densities non-parametrically. They are extensively encountered in health sciences to summarize data in a compact format. Examples are age-specific distributions of death or onset of diseases grouped in 5-years age classes with an open-ended age gr...

Descripción completa

Detalles Bibliográficos
Autores principales: Rizzi, Silvia, Thinggaard, Mikael, Engholm, Gerda, Christensen, Niels, Johannesen, Tom Børge, Vaupel, James W., Lindahl-Jacobsen, Rune
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4877978/
https://www.ncbi.nlm.nih.gov/pubmed/27216531
http://dx.doi.org/10.1186/s12874-016-0157-8
_version_ 1782433490691686400
author Rizzi, Silvia
Thinggaard, Mikael
Engholm, Gerda
Christensen, Niels
Johannesen, Tom Børge
Vaupel, James W.
Lindahl-Jacobsen, Rune
author_facet Rizzi, Silvia
Thinggaard, Mikael
Engholm, Gerda
Christensen, Niels
Johannesen, Tom Børge
Vaupel, James W.
Lindahl-Jacobsen, Rune
author_sort Rizzi, Silvia
collection PubMed
description BACKGROUND: Histograms are a common tool to estimate densities non-parametrically. They are extensively encountered in health sciences to summarize data in a compact format. Examples are age-specific distributions of death or onset of diseases grouped in 5-years age classes with an open-ended age group at the highest ages. When histogram intervals are too coarse, information is lost and comparison between histograms with different boundaries is arduous. In these cases it is useful to estimate detailed distributions from grouped data. METHODS: From an extensive literature search we identify five methods for ungrouping count data. We compare the performance of two spline interpolation methods, two kernel density estimators and a penalized composite link model first via a simulation study and then with empirical data obtained from the NORDCAN Database. All methods analyzed can be used to estimate differently shaped distributions; can handle unequal interval length; and allow stretches of 0 counts. RESULTS: The methods show similar performance when the grouping scheme is relatively narrow, i.e. 5-years age classes. With coarser age intervals, i.e. in the presence of open-ended age groups, the penalized composite link model performs the best. CONCLUSION: We give an overview and test different methods to estimate detailed distributions from grouped count data. Health researchers can benefit from these versatile methods, which are ready for use in the statistical software R. We recommend using the penalized composite link model when data are grouped in wide age classes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12874-016-0157-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4877978
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-48779782016-05-25 Comparison of non-parametric methods for ungrouping coarsely aggregated data Rizzi, Silvia Thinggaard, Mikael Engholm, Gerda Christensen, Niels Johannesen, Tom Børge Vaupel, James W. Lindahl-Jacobsen, Rune BMC Med Res Methodol Research Article BACKGROUND: Histograms are a common tool to estimate densities non-parametrically. They are extensively encountered in health sciences to summarize data in a compact format. Examples are age-specific distributions of death or onset of diseases grouped in 5-years age classes with an open-ended age group at the highest ages. When histogram intervals are too coarse, information is lost and comparison between histograms with different boundaries is arduous. In these cases it is useful to estimate detailed distributions from grouped data. METHODS: From an extensive literature search we identify five methods for ungrouping count data. We compare the performance of two spline interpolation methods, two kernel density estimators and a penalized composite link model first via a simulation study and then with empirical data obtained from the NORDCAN Database. All methods analyzed can be used to estimate differently shaped distributions; can handle unequal interval length; and allow stretches of 0 counts. RESULTS: The methods show similar performance when the grouping scheme is relatively narrow, i.e. 5-years age classes. With coarser age intervals, i.e. in the presence of open-ended age groups, the penalized composite link model performs the best. CONCLUSION: We give an overview and test different methods to estimate detailed distributions from grouped count data. Health researchers can benefit from these versatile methods, which are ready for use in the statistical software R. We recommend using the penalized composite link model when data are grouped in wide age classes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12874-016-0157-8) contains supplementary material, which is available to authorized users. BioMed Central 2016-05-23 /pmc/articles/PMC4877978/ /pubmed/27216531 http://dx.doi.org/10.1186/s12874-016-0157-8 Text en © Rizzi et al. 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Rizzi, Silvia
Thinggaard, Mikael
Engholm, Gerda
Christensen, Niels
Johannesen, Tom Børge
Vaupel, James W.
Lindahl-Jacobsen, Rune
Comparison of non-parametric methods for ungrouping coarsely aggregated data
title Comparison of non-parametric methods for ungrouping coarsely aggregated data
title_full Comparison of non-parametric methods for ungrouping coarsely aggregated data
title_fullStr Comparison of non-parametric methods for ungrouping coarsely aggregated data
title_full_unstemmed Comparison of non-parametric methods for ungrouping coarsely aggregated data
title_short Comparison of non-parametric methods for ungrouping coarsely aggregated data
title_sort comparison of non-parametric methods for ungrouping coarsely aggregated data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4877978/
https://www.ncbi.nlm.nih.gov/pubmed/27216531
http://dx.doi.org/10.1186/s12874-016-0157-8
work_keys_str_mv AT rizzisilvia comparisonofnonparametricmethodsforungroupingcoarselyaggregateddata
AT thinggaardmikael comparisonofnonparametricmethodsforungroupingcoarselyaggregateddata
AT engholmgerda comparisonofnonparametricmethodsforungroupingcoarselyaggregateddata
AT christensenniels comparisonofnonparametricmethodsforungroupingcoarselyaggregateddata
AT johannesentombørge comparisonofnonparametricmethodsforungroupingcoarselyaggregateddata
AT vaupeljamesw comparisonofnonparametricmethodsforungroupingcoarselyaggregateddata
AT lindahljacobsenrune comparisonofnonparametricmethodsforungroupingcoarselyaggregateddata