Cargando…

Data-driven methods for imputing national-level incidence in global burden of disease studies

OBJECTIVE: To develop transparent and reproducible methods for imputing missing data on disease incidence at national-level for the year 2005. METHODS: We compared several models for imputing missing country-level incidence rates for two foodborne diseases – congenital toxoplasmosis and aflatoxin-re...

Descripción completa

Detalles Bibliográficos
Autores principales: McDonald, Scott A, Devleesschauwer, Brecht, Speybroeck, Niko, Hens, Niel, Praet, Nicolas, Torgerson, Paul R, Havelaar, Arie H, Wu, Felicia, Tremblay, Marlène, Amene, Ermias W, Döpfer, Dörte
Formato: Online Artículo Texto
Lenguaje:English
Publicado: World Health Organization 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4431555/
https://www.ncbi.nlm.nih.gov/pubmed/26229187
http://dx.doi.org/10.2471/BLT.14.139972
_version_ 1782371372474826752
author McDonald, Scott A
Devleesschauwer, Brecht
Speybroeck, Niko
Hens, Niel
Praet, Nicolas
Torgerson, Paul R
Havelaar, Arie H
Wu, Felicia
Tremblay, Marlène
Amene, Ermias W
Döpfer, Dörte
author_facet McDonald, Scott A
Devleesschauwer, Brecht
Speybroeck, Niko
Hens, Niel
Praet, Nicolas
Torgerson, Paul R
Havelaar, Arie H
Wu, Felicia
Tremblay, Marlène
Amene, Ermias W
Döpfer, Dörte
author_sort McDonald, Scott A
collection PubMed
description OBJECTIVE: To develop transparent and reproducible methods for imputing missing data on disease incidence at national-level for the year 2005. METHODS: We compared several models for imputing missing country-level incidence rates for two foodborne diseases – congenital toxoplasmosis and aflatoxin-related hepatocellular carcinoma. Missing values were assumed to be missing at random. Predictor variables were selected using least absolute shrinkage and selection operator regression. We compared the predictive performance of naive extrapolation approaches and Bayesian random and mixed-effects regression models. Leave-one-out cross-validation was used to evaluate model accuracy. FINDINGS: The predictive accuracy of the Bayesian mixed-effects models was significantly better than that of the naive extrapolation method for one of the two disease models. However, Bayesian mixed-effects models produced wider prediction intervals for both data sets. CONCLUSION: Several approaches are available for imputing missing data at national level. Strengths of a hierarchical regression approach for this type of task are the ability to derive estimates from other similar countries, transparency, computational efficiency and ease of interpretation. The inclusion of informative covariates may improve model performance, but results should be appraised carefully.
format Online
Article
Text
id pubmed-4431555
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher World Health Organization
record_format MEDLINE/PubMed
spelling pubmed-44315552015-07-30 Data-driven methods for imputing national-level incidence in global burden of disease studies McDonald, Scott A Devleesschauwer, Brecht Speybroeck, Niko Hens, Niel Praet, Nicolas Torgerson, Paul R Havelaar, Arie H Wu, Felicia Tremblay, Marlène Amene, Ermias W Döpfer, Dörte Bull World Health Organ Research OBJECTIVE: To develop transparent and reproducible methods for imputing missing data on disease incidence at national-level for the year 2005. METHODS: We compared several models for imputing missing country-level incidence rates for two foodborne diseases – congenital toxoplasmosis and aflatoxin-related hepatocellular carcinoma. Missing values were assumed to be missing at random. Predictor variables were selected using least absolute shrinkage and selection operator regression. We compared the predictive performance of naive extrapolation approaches and Bayesian random and mixed-effects regression models. Leave-one-out cross-validation was used to evaluate model accuracy. FINDINGS: The predictive accuracy of the Bayesian mixed-effects models was significantly better than that of the naive extrapolation method for one of the two disease models. However, Bayesian mixed-effects models produced wider prediction intervals for both data sets. CONCLUSION: Several approaches are available for imputing missing data at national level. Strengths of a hierarchical regression approach for this type of task are the ability to derive estimates from other similar countries, transparency, computational efficiency and ease of interpretation. The inclusion of informative covariates may improve model performance, but results should be appraised carefully. World Health Organization 2015-04-01 2015-02-27 /pmc/articles/PMC4431555/ /pubmed/26229187 http://dx.doi.org/10.2471/BLT.14.139972 Text en (c) 2015 The authors; licensee World Health Organization. This is an open access article distributed under the terms of the Creative Commons Attribution IGO License (http://creativecommons.org/licenses/by/3.0/igo/legalcode), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. In any reproduction of this article there should not be any suggestion that WHO or this article endorse any specific organization or products. The use of the WHO logo is not permitted. This notice should be preserved along with the article's original URL.
spellingShingle Research
McDonald, Scott A
Devleesschauwer, Brecht
Speybroeck, Niko
Hens, Niel
Praet, Nicolas
Torgerson, Paul R
Havelaar, Arie H
Wu, Felicia
Tremblay, Marlène
Amene, Ermias W
Döpfer, Dörte
Data-driven methods for imputing national-level incidence in global burden of disease studies
title Data-driven methods for imputing national-level incidence in global burden of disease studies
title_full Data-driven methods for imputing national-level incidence in global burden of disease studies
title_fullStr Data-driven methods for imputing national-level incidence in global burden of disease studies
title_full_unstemmed Data-driven methods for imputing national-level incidence in global burden of disease studies
title_short Data-driven methods for imputing national-level incidence in global burden of disease studies
title_sort data-driven methods for imputing national-level incidence in global burden of disease studies
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4431555/
https://www.ncbi.nlm.nih.gov/pubmed/26229187
http://dx.doi.org/10.2471/BLT.14.139972
work_keys_str_mv AT mcdonaldscotta datadrivenmethodsforimputingnationallevelincidenceinglobalburdenofdiseasestudies
AT devleesschauwerbrecht datadrivenmethodsforimputingnationallevelincidenceinglobalburdenofdiseasestudies
AT speybroeckniko datadrivenmethodsforimputingnationallevelincidenceinglobalburdenofdiseasestudies
AT hensniel datadrivenmethodsforimputingnationallevelincidenceinglobalburdenofdiseasestudies
AT praetnicolas datadrivenmethodsforimputingnationallevelincidenceinglobalburdenofdiseasestudies
AT torgersonpaulr datadrivenmethodsforimputingnationallevelincidenceinglobalburdenofdiseasestudies
AT havelaararieh datadrivenmethodsforimputingnationallevelincidenceinglobalburdenofdiseasestudies
AT wufelicia datadrivenmethodsforimputingnationallevelincidenceinglobalburdenofdiseasestudies
AT tremblaymarlene datadrivenmethodsforimputingnationallevelincidenceinglobalburdenofdiseasestudies
AT ameneermiasw datadrivenmethodsforimputingnationallevelincidenceinglobalburdenofdiseasestudies
AT dopferdorte datadrivenmethodsforimputingnationallevelincidenceinglobalburdenofdiseasestudies