Cargando…

The curse of dimensionality: Animal-related risk factors for pediatric diarrhea in western Kenya, and methods for dealing with a large number of predictors

BACKGROUND: Pediatric diarrhea, a leading cause of under-five mortality, is predominantly infectious in etiology. As many putative causal agents are zoonotic, animal exposure is a likely risk factor. To evaluate the effect of animal-related factors on moderate to severe childhood diarrhea in rural K...

Descripción completa

Detalles Bibliográficos
Autores principales: Meisner, Julianne, Mooney, Stephen J., Rabinowitz, Peter M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6485705/
https://www.ncbi.nlm.nih.gov/pubmed/31026290
http://dx.doi.org/10.1371/journal.pone.0215982
_version_ 1783414289946640384
author Meisner, Julianne
Mooney, Stephen J.
Rabinowitz, Peter M.
author_facet Meisner, Julianne
Mooney, Stephen J.
Rabinowitz, Peter M.
author_sort Meisner, Julianne
collection PubMed
description BACKGROUND: Pediatric diarrhea, a leading cause of under-five mortality, is predominantly infectious in etiology. As many putative causal agents are zoonotic, animal exposure is a likely risk factor. To evaluate the effect of animal-related factors on moderate to severe childhood diarrhea in rural Kenya, where animal contact is common, Conan et al. studied 73 matched case-control pairs from 2009-2011, collecting rich exposure data on many dimensions of animal contact. We review the challenges associated with analyzing moderately-sized datasets with a large number of predictors and present two alternative methodological approaches. METHODOLOGY/PRINCIPAL FINDINGS: We conducted a simulation study to demonstrate that forward stepwise selection results in overfit models when data are high-dimensional, and that p values reported directly from the data used to train these models are misleading. We described how automated methods of variable selection, attractive when the number of predictors is large, can result in overadjustment bias. We proposed an alternative a priori regression approach not subject to this bias. Applied to Conan et al.’s data, this approach found a non-significant but positive trend for household’s sharing of water sources with livestock or poultry, child’s presence for poultry slaughter, and child’s habit of playing where poultry sleep or defecate. For many predictors evaluated few pairs were discordant, suggesting matching compromised the power of this analysis. Finally, we proposed latent variable modeling as a complimentary approach and performed Item Response Theory modeling on Conan et al.’s data, with animal contact as the latent trait. We found a moderate but non-significant effect (OR 1.21, 95% CI 0.78, 1.87, unit = 1 standard deviation). CONCLUSIONS/SIGNIFICANCE: Automated methods of model selection are appropriate for prediction models when fit and evaluated on separate samples. However when the goal is inference, these methods can produce misleading results. Furthermore, case-control matching should be done with caution.
format Online
Article
Text
id pubmed-6485705
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-64857052019-05-09 The curse of dimensionality: Animal-related risk factors for pediatric diarrhea in western Kenya, and methods for dealing with a large number of predictors Meisner, Julianne Mooney, Stephen J. Rabinowitz, Peter M. PLoS One Research Article BACKGROUND: Pediatric diarrhea, a leading cause of under-five mortality, is predominantly infectious in etiology. As many putative causal agents are zoonotic, animal exposure is a likely risk factor. To evaluate the effect of animal-related factors on moderate to severe childhood diarrhea in rural Kenya, where animal contact is common, Conan et al. studied 73 matched case-control pairs from 2009-2011, collecting rich exposure data on many dimensions of animal contact. We review the challenges associated with analyzing moderately-sized datasets with a large number of predictors and present two alternative methodological approaches. METHODOLOGY/PRINCIPAL FINDINGS: We conducted a simulation study to demonstrate that forward stepwise selection results in overfit models when data are high-dimensional, and that p values reported directly from the data used to train these models are misleading. We described how automated methods of variable selection, attractive when the number of predictors is large, can result in overadjustment bias. We proposed an alternative a priori regression approach not subject to this bias. Applied to Conan et al.’s data, this approach found a non-significant but positive trend for household’s sharing of water sources with livestock or poultry, child’s presence for poultry slaughter, and child’s habit of playing where poultry sleep or defecate. For many predictors evaluated few pairs were discordant, suggesting matching compromised the power of this analysis. Finally, we proposed latent variable modeling as a complimentary approach and performed Item Response Theory modeling on Conan et al.’s data, with animal contact as the latent trait. We found a moderate but non-significant effect (OR 1.21, 95% CI 0.78, 1.87, unit = 1 standard deviation). CONCLUSIONS/SIGNIFICANCE: Automated methods of model selection are appropriate for prediction models when fit and evaluated on separate samples. However when the goal is inference, these methods can produce misleading results. Furthermore, case-control matching should be done with caution. Public Library of Science 2019-04-26 /pmc/articles/PMC6485705/ /pubmed/31026290 http://dx.doi.org/10.1371/journal.pone.0215982 Text en © 2019 Meisner et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Meisner, Julianne
Mooney, Stephen J.
Rabinowitz, Peter M.
The curse of dimensionality: Animal-related risk factors for pediatric diarrhea in western Kenya, and methods for dealing with a large number of predictors
title The curse of dimensionality: Animal-related risk factors for pediatric diarrhea in western Kenya, and methods for dealing with a large number of predictors
title_full The curse of dimensionality: Animal-related risk factors for pediatric diarrhea in western Kenya, and methods for dealing with a large number of predictors
title_fullStr The curse of dimensionality: Animal-related risk factors for pediatric diarrhea in western Kenya, and methods for dealing with a large number of predictors
title_full_unstemmed The curse of dimensionality: Animal-related risk factors for pediatric diarrhea in western Kenya, and methods for dealing with a large number of predictors
title_short The curse of dimensionality: Animal-related risk factors for pediatric diarrhea in western Kenya, and methods for dealing with a large number of predictors
title_sort curse of dimensionality: animal-related risk factors for pediatric diarrhea in western kenya, and methods for dealing with a large number of predictors
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6485705/
https://www.ncbi.nlm.nih.gov/pubmed/31026290
http://dx.doi.org/10.1371/journal.pone.0215982
work_keys_str_mv AT meisnerjulianne thecurseofdimensionalityanimalrelatedriskfactorsforpediatricdiarrheainwesternkenyaandmethodsfordealingwithalargenumberofpredictors
AT mooneystephenj thecurseofdimensionalityanimalrelatedriskfactorsforpediatricdiarrheainwesternkenyaandmethodsfordealingwithalargenumberofpredictors
AT rabinowitzpeterm thecurseofdimensionalityanimalrelatedriskfactorsforpediatricdiarrheainwesternkenyaandmethodsfordealingwithalargenumberofpredictors
AT meisnerjulianne curseofdimensionalityanimalrelatedriskfactorsforpediatricdiarrheainwesternkenyaandmethodsfordealingwithalargenumberofpredictors
AT mooneystephenj curseofdimensionalityanimalrelatedriskfactorsforpediatricdiarrheainwesternkenyaandmethodsfordealingwithalargenumberofpredictors
AT rabinowitzpeterm curseofdimensionalityanimalrelatedriskfactorsforpediatricdiarrheainwesternkenyaandmethodsfordealingwithalargenumberofpredictors