Cargando…

Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya

Large datasets are often not amenable to analysis using traditional single-step approaches. Here, our general objective was to apply imputation techniques, principal component analysis (PCA), elastic net and generalized linear models to a large dataset in a systematic approach to extract the most me...

Descripción completa

Detalles Bibliográficos
Autores principales: TREMBLAY, M., DAHM, J. S., WAMAE, C. N., DE GLANVILLE, W. A., FÈVRE, E. M., DÖPFER, D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cambridge University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4657027/
https://www.ncbi.nlm.nih.gov/pubmed/25876816
http://dx.doi.org/10.1017/S0950268815000710
_version_ 1782402320242311168
author TREMBLAY, M.
DAHM, J. S.
WAMAE, C. N.
DE GLANVILLE, W. A.
FÈVRE, E. M.
DÖPFER, D.
author_facet TREMBLAY, M.
DAHM, J. S.
WAMAE, C. N.
DE GLANVILLE, W. A.
FÈVRE, E. M.
DÖPFER, D.
author_sort TREMBLAY, M.
collection PubMed
description Large datasets are often not amenable to analysis using traditional single-step approaches. Here, our general objective was to apply imputation techniques, principal component analysis (PCA), elastic net and generalized linear models to a large dataset in a systematic approach to extract the most meaningful predictors for a health outcome. We extracted predictors for Plasmodium falciparum infection, from a large covariate dataset while facing limited numbers of observations, using data from the People, Animals, and their Zoonoses (PAZ) project to demonstrate these techniques: data collected from 415 homesteads in western Kenya, contained over 1500 variables that describe the health, environment, and social factors of the humans, livestock, and the homesteads in which they reside. The wide, sparse dataset was simplified to 42 predictors of P. falciparum malaria infection and wealth rankings were produced for all homesteads. The 42 predictors make biological sense and are supported by previous studies. This systematic data-mining approach we used would make many large datasets more manageable and informative for decision-making processes and health policy prioritization.
format Online
Article
Text
id pubmed-4657027
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Cambridge University Press
record_format MEDLINE/PubMed
spelling pubmed-46570272015-12-02 Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya TREMBLAY, M. DAHM, J. S. WAMAE, C. N. DE GLANVILLE, W. A. FÈVRE, E. M. DÖPFER, D. Epidemiol Infect Original Papers Large datasets are often not amenable to analysis using traditional single-step approaches. Here, our general objective was to apply imputation techniques, principal component analysis (PCA), elastic net and generalized linear models to a large dataset in a systematic approach to extract the most meaningful predictors for a health outcome. We extracted predictors for Plasmodium falciparum infection, from a large covariate dataset while facing limited numbers of observations, using data from the People, Animals, and their Zoonoses (PAZ) project to demonstrate these techniques: data collected from 415 homesteads in western Kenya, contained over 1500 variables that describe the health, environment, and social factors of the humans, livestock, and the homesteads in which they reside. The wide, sparse dataset was simplified to 42 predictors of P. falciparum malaria infection and wealth rankings were produced for all homesteads. The 42 predictors make biological sense and are supported by previous studies. This systematic data-mining approach we used would make many large datasets more manageable and informative for decision-making processes and health policy prioritization. Cambridge University Press 2015-12 2015-04-16 /pmc/articles/PMC4657027/ /pubmed/25876816 http://dx.doi.org/10.1017/S0950268815000710 Text en © Cambridge University Press 2015 https://creativecommons.org/licenses/by/3.0/This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/3.0/ (https://creativecommons.org/licenses/by/3.0/) ), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
TREMBLAY, M.
DAHM, J. S.
WAMAE, C. N.
DE GLANVILLE, W. A.
FÈVRE, E. M.
DÖPFER, D.
Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya
title Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya
title_full Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya
title_fullStr Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya
title_full_unstemmed Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya
title_short Shrinking a large dataset to identify variables associated with increased risk of Plasmodium falciparum infection in Western Kenya
title_sort shrinking a large dataset to identify variables associated with increased risk of plasmodium falciparum infection in western kenya
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4657027/
https://www.ncbi.nlm.nih.gov/pubmed/25876816
http://dx.doi.org/10.1017/S0950268815000710
work_keys_str_mv AT tremblaym shrinkingalargedatasettoidentifyvariablesassociatedwithincreasedriskofplasmodiumfalciparuminfectioninwesternkenya
AT dahmjs shrinkingalargedatasettoidentifyvariablesassociatedwithincreasedriskofplasmodiumfalciparuminfectioninwesternkenya
AT wamaecn shrinkingalargedatasettoidentifyvariablesassociatedwithincreasedriskofplasmodiumfalciparuminfectioninwesternkenya
AT deglanvillewa shrinkingalargedatasettoidentifyvariablesassociatedwithincreasedriskofplasmodiumfalciparuminfectioninwesternkenya
AT fevreem shrinkingalargedatasettoidentifyvariablesassociatedwithincreasedriskofplasmodiumfalciparuminfectioninwesternkenya
AT dopferd shrinkingalargedatasettoidentifyvariablesassociatedwithincreasedriskofplasmodiumfalciparuminfectioninwesternkenya