Cargando…

Developing a spatial-statistical model and map of historical malaria prevalence in Botswana using a staged variable selection procedure

BACKGROUND: Several malaria risk maps have been developed in recent years, many from the prevalence of infection data collated by the MARA (Mapping Malaria Risk in Africa) project, and using various environmental data sets as predictors. Variable selection is a major obstacle due to analytical probl...

Descripción completa

Detalles Bibliográficos
Autores principales: Craig, Marlies H, Sharp, Brian L, Mabaso, Musawenkosi LH, Kleinschmidt, Immo
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2082025/
https://www.ncbi.nlm.nih.gov/pubmed/17892584
http://dx.doi.org/10.1186/1476-072X-6-44
_version_ 1782138160045621248
author Craig, Marlies H
Sharp, Brian L
Mabaso, Musawenkosi LH
Kleinschmidt, Immo
author_facet Craig, Marlies H
Sharp, Brian L
Mabaso, Musawenkosi LH
Kleinschmidt, Immo
author_sort Craig, Marlies H
collection PubMed
description BACKGROUND: Several malaria risk maps have been developed in recent years, many from the prevalence of infection data collated by the MARA (Mapping Malaria Risk in Africa) project, and using various environmental data sets as predictors. Variable selection is a major obstacle due to analytical problems caused by over-fitting, confounding and non-independence in the data. Testing and comparing every combination of explanatory variables in a Bayesian spatial framework remains unfeasible for most researchers. The aim of this study was to develop a malaria risk map using a systematic and practicable variable selection process for spatial analysis and mapping of historical malaria risk in Botswana. RESULTS: Of 50 potential explanatory variables from eight environmental data themes, 42 were significantly associated with malaria prevalence in univariate logistic regression and were ranked by the Akaike Information Criterion. Those correlated with higher-ranking relatives of the same environmental theme, were temporarily excluded. The remaining 14 candidates were ranked by selection frequency after running automated step-wise selection procedures on 1000 bootstrap samples drawn from the data. A non-spatial multiple-variable model was developed through step-wise inclusion in order of selection frequency. Previously excluded variables were then re-evaluated for inclusion, using further step-wise bootstrap procedures, resulting in the exclusion of another variable. Finally a Bayesian geo-statistical model using Markov Chain Monte Carlo simulation was fitted to the data, resulting in a final model of three predictor variables, namely summer rainfall, mean annual temperature and altitude. Each was independently and significantly associated with malaria prevalence after allowing for spatial correlation. This model was used to predict malaria prevalence at unobserved locations, producing a smooth risk map for the whole country. CONCLUSION: We have produced a highly plausible and parsimonious model of historical malaria risk for Botswana from point-referenced data from a 1961/2 prevalence survey of malaria infection in 1–14 year old children. After starting with a list of 50 potential variables we ended with three highly plausible predictors, by applying a systematic and repeatable staged variable selection procedure that included a spatial analysis, which has application for other environmentally determined infectious diseases. All this was accomplished using general-purpose statistical software.
format Text
id pubmed-2082025
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-20820252007-11-20 Developing a spatial-statistical model and map of historical malaria prevalence in Botswana using a staged variable selection procedure Craig, Marlies H Sharp, Brian L Mabaso, Musawenkosi LH Kleinschmidt, Immo Int J Health Geogr Research BACKGROUND: Several malaria risk maps have been developed in recent years, many from the prevalence of infection data collated by the MARA (Mapping Malaria Risk in Africa) project, and using various environmental data sets as predictors. Variable selection is a major obstacle due to analytical problems caused by over-fitting, confounding and non-independence in the data. Testing and comparing every combination of explanatory variables in a Bayesian spatial framework remains unfeasible for most researchers. The aim of this study was to develop a malaria risk map using a systematic and practicable variable selection process for spatial analysis and mapping of historical malaria risk in Botswana. RESULTS: Of 50 potential explanatory variables from eight environmental data themes, 42 were significantly associated with malaria prevalence in univariate logistic regression and were ranked by the Akaike Information Criterion. Those correlated with higher-ranking relatives of the same environmental theme, were temporarily excluded. The remaining 14 candidates were ranked by selection frequency after running automated step-wise selection procedures on 1000 bootstrap samples drawn from the data. A non-spatial multiple-variable model was developed through step-wise inclusion in order of selection frequency. Previously excluded variables were then re-evaluated for inclusion, using further step-wise bootstrap procedures, resulting in the exclusion of another variable. Finally a Bayesian geo-statistical model using Markov Chain Monte Carlo simulation was fitted to the data, resulting in a final model of three predictor variables, namely summer rainfall, mean annual temperature and altitude. Each was independently and significantly associated with malaria prevalence after allowing for spatial correlation. This model was used to predict malaria prevalence at unobserved locations, producing a smooth risk map for the whole country. CONCLUSION: We have produced a highly plausible and parsimonious model of historical malaria risk for Botswana from point-referenced data from a 1961/2 prevalence survey of malaria infection in 1–14 year old children. After starting with a list of 50 potential variables we ended with three highly plausible predictors, by applying a systematic and repeatable staged variable selection procedure that included a spatial analysis, which has application for other environmentally determined infectious diseases. All this was accomplished using general-purpose statistical software. BioMed Central 2007-09-24 /pmc/articles/PMC2082025/ /pubmed/17892584 http://dx.doi.org/10.1186/1476-072X-6-44 Text en Copyright © 2007 Craig et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Craig, Marlies H
Sharp, Brian L
Mabaso, Musawenkosi LH
Kleinschmidt, Immo
Developing a spatial-statistical model and map of historical malaria prevalence in Botswana using a staged variable selection procedure
title Developing a spatial-statistical model and map of historical malaria prevalence in Botswana using a staged variable selection procedure
title_full Developing a spatial-statistical model and map of historical malaria prevalence in Botswana using a staged variable selection procedure
title_fullStr Developing a spatial-statistical model and map of historical malaria prevalence in Botswana using a staged variable selection procedure
title_full_unstemmed Developing a spatial-statistical model and map of historical malaria prevalence in Botswana using a staged variable selection procedure
title_short Developing a spatial-statistical model and map of historical malaria prevalence in Botswana using a staged variable selection procedure
title_sort developing a spatial-statistical model and map of historical malaria prevalence in botswana using a staged variable selection procedure
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2082025/
https://www.ncbi.nlm.nih.gov/pubmed/17892584
http://dx.doi.org/10.1186/1476-072X-6-44
work_keys_str_mv AT craigmarliesh developingaspatialstatisticalmodelandmapofhistoricalmalariaprevalenceinbotswanausingastagedvariableselectionprocedure
AT sharpbrianl developingaspatialstatisticalmodelandmapofhistoricalmalariaprevalenceinbotswanausingastagedvariableselectionprocedure
AT mabasomusawenkosilh developingaspatialstatisticalmodelandmapofhistoricalmalariaprevalenceinbotswanausingastagedvariableselectionprocedure
AT kleinschmidtimmo developingaspatialstatisticalmodelandmapofhistoricalmalariaprevalenceinbotswanausingastagedvariableselectionprocedure