Cargando…

Re-examining environmental correlates of Plasmodium falciparum malaria endemicity: a data-intensive variable selection approach

BACKGROUND: Malaria risk maps play an increasingly important role in disease control planning, implementation, and evaluation. The construction of these maps using modern geospatial techniques relies on covariate grids: continuous surfaces quantifying environmental factors that partially explain spa...

Descripción completa

Detalles Bibliográficos
Autores principales: Weiss, Daniel J, Mappin, Bonnie, Dalrymple, Ursula, Bhatt, Samir, Cameron, Ewan, Hay, Simon I, Gething, Peter W
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4333887/
https://www.ncbi.nlm.nih.gov/pubmed/25890035
http://dx.doi.org/10.1186/s12936-015-0574-x
_version_ 1782358119822655488
author Weiss, Daniel J
Mappin, Bonnie
Dalrymple, Ursula
Bhatt, Samir
Cameron, Ewan
Hay, Simon I
Gething, Peter W
author_facet Weiss, Daniel J
Mappin, Bonnie
Dalrymple, Ursula
Bhatt, Samir
Cameron, Ewan
Hay, Simon I
Gething, Peter W
author_sort Weiss, Daniel J
collection PubMed
description BACKGROUND: Malaria risk maps play an increasingly important role in disease control planning, implementation, and evaluation. The construction of these maps using modern geospatial techniques relies on covariate grids: continuous surfaces quantifying environmental factors that partially explain spatial heterogeneity in malaria endemicity. Although crucial, past variable selection processes for this purpose have often been subjective and ad-hoc, with many covariates used in modeling with little quantitative justification. METHODS: This research consists of an extensive covariate construction and selection process for predicting Plasmodium falciparum parasite rates (PfPR) in Africa for years 2000-2012. First, a literature review was conducted to establish a comprehensive list of covariates used for malaria mapping. Second, a library of covariate data was assembled to reflect this list, a process that included the construction of multiple, temporally dynamic datasets. Third, the resulting set of covariates was leveraged to create more than 50 million possible covariate terms via factorial combinations of different spatial and temporal aggregations, transformations, and pairwise interactions. Fourth, the expanded set of covariates was reduced via successive selection criteria to yield a robust covariate subset that was assessed using an out-of-sample validation approach. RESULTS: The final covariate subset included predominately dynamic covariates and it substantially out-performed earlier sets used by the Malaria Atlas Project (MAP) for creating global malaria risk maps, with the pseudo-R(2) value for the out-of-sample validation increasing from 0.43 to 0.52. Dynamic covariates improved the model, with 17 of the 20 new covariates consisting of monthly or annual products, but the selected covariates were typically interaction terms that included both dynamic and synoptic datasets. Thus the interplay between normal (i.e., long-term averages) and immediate conditions may be key for characterizing environmental controls on parasite rate. CONCLUSIONS: This analysis represents the first effort to systematically audit covariate utility for malaria mapping and then derive an objective, empirically based set of environmental covariates for modeling PfPR. The new covariates produce more reliable representations of malaria risk patterns and how they are changing through time, and these covariates will be used to characterize spatially and temporally varying environmental conditions affecting PfPR within a geostatistical-modeling framework, thus building upon previous research by MAP that produced global malaria maps for 2007 and 2010. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12936-015-0574-x) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4333887
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43338872015-02-20 Re-examining environmental correlates of Plasmodium falciparum malaria endemicity: a data-intensive variable selection approach Weiss, Daniel J Mappin, Bonnie Dalrymple, Ursula Bhatt, Samir Cameron, Ewan Hay, Simon I Gething, Peter W Malar J Research BACKGROUND: Malaria risk maps play an increasingly important role in disease control planning, implementation, and evaluation. The construction of these maps using modern geospatial techniques relies on covariate grids: continuous surfaces quantifying environmental factors that partially explain spatial heterogeneity in malaria endemicity. Although crucial, past variable selection processes for this purpose have often been subjective and ad-hoc, with many covariates used in modeling with little quantitative justification. METHODS: This research consists of an extensive covariate construction and selection process for predicting Plasmodium falciparum parasite rates (PfPR) in Africa for years 2000-2012. First, a literature review was conducted to establish a comprehensive list of covariates used for malaria mapping. Second, a library of covariate data was assembled to reflect this list, a process that included the construction of multiple, temporally dynamic datasets. Third, the resulting set of covariates was leveraged to create more than 50 million possible covariate terms via factorial combinations of different spatial and temporal aggregations, transformations, and pairwise interactions. Fourth, the expanded set of covariates was reduced via successive selection criteria to yield a robust covariate subset that was assessed using an out-of-sample validation approach. RESULTS: The final covariate subset included predominately dynamic covariates and it substantially out-performed earlier sets used by the Malaria Atlas Project (MAP) for creating global malaria risk maps, with the pseudo-R(2) value for the out-of-sample validation increasing from 0.43 to 0.52. Dynamic covariates improved the model, with 17 of the 20 new covariates consisting of monthly or annual products, but the selected covariates were typically interaction terms that included both dynamic and synoptic datasets. Thus the interplay between normal (i.e., long-term averages) and immediate conditions may be key for characterizing environmental controls on parasite rate. CONCLUSIONS: This analysis represents the first effort to systematically audit covariate utility for malaria mapping and then derive an objective, empirically based set of environmental covariates for modeling PfPR. The new covariates produce more reliable representations of malaria risk patterns and how they are changing through time, and these covariates will be used to characterize spatially and temporally varying environmental conditions affecting PfPR within a geostatistical-modeling framework, thus building upon previous research by MAP that produced global malaria maps for 2007 and 2010. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12936-015-0574-x) contains supplementary material, which is available to authorized users. BioMed Central 2015-02-07 /pmc/articles/PMC4333887/ /pubmed/25890035 http://dx.doi.org/10.1186/s12936-015-0574-x Text en © Weiss et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Weiss, Daniel J
Mappin, Bonnie
Dalrymple, Ursula
Bhatt, Samir
Cameron, Ewan
Hay, Simon I
Gething, Peter W
Re-examining environmental correlates of Plasmodium falciparum malaria endemicity: a data-intensive variable selection approach
title Re-examining environmental correlates of Plasmodium falciparum malaria endemicity: a data-intensive variable selection approach
title_full Re-examining environmental correlates of Plasmodium falciparum malaria endemicity: a data-intensive variable selection approach
title_fullStr Re-examining environmental correlates of Plasmodium falciparum malaria endemicity: a data-intensive variable selection approach
title_full_unstemmed Re-examining environmental correlates of Plasmodium falciparum malaria endemicity: a data-intensive variable selection approach
title_short Re-examining environmental correlates of Plasmodium falciparum malaria endemicity: a data-intensive variable selection approach
title_sort re-examining environmental correlates of plasmodium falciparum malaria endemicity: a data-intensive variable selection approach
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4333887/
https://www.ncbi.nlm.nih.gov/pubmed/25890035
http://dx.doi.org/10.1186/s12936-015-0574-x
work_keys_str_mv AT weissdanielj reexaminingenvironmentalcorrelatesofplasmodiumfalciparummalariaendemicityadataintensivevariableselectionapproach
AT mappinbonnie reexaminingenvironmentalcorrelatesofplasmodiumfalciparummalariaendemicityadataintensivevariableselectionapproach
AT dalrympleursula reexaminingenvironmentalcorrelatesofplasmodiumfalciparummalariaendemicityadataintensivevariableselectionapproach
AT bhattsamir reexaminingenvironmentalcorrelatesofplasmodiumfalciparummalariaendemicityadataintensivevariableselectionapproach
AT cameronewan reexaminingenvironmentalcorrelatesofplasmodiumfalciparummalariaendemicityadataintensivevariableselectionapproach
AT haysimoni reexaminingenvironmentalcorrelatesofplasmodiumfalciparummalariaendemicityadataintensivevariableselectionapproach
AT gethingpeterw reexaminingenvironmentalcorrelatesofplasmodiumfalciparummalariaendemicityadataintensivevariableselectionapproach