Cargando…
The curse of dimensionality: Animal-related risk factors for pediatric diarrhea in western Kenya, and methods for dealing with a large number of predictors
BACKGROUND: Pediatric diarrhea, a leading cause of under-five mortality, is predominantly infectious in etiology. As many putative causal agents are zoonotic, animal exposure is a likely risk factor. To evaluate the effect of animal-related factors on moderate to severe childhood diarrhea in rural K...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6485705/ https://www.ncbi.nlm.nih.gov/pubmed/31026290 http://dx.doi.org/10.1371/journal.pone.0215982 |
_version_ | 1783414289946640384 |
---|---|
author | Meisner, Julianne Mooney, Stephen J. Rabinowitz, Peter M. |
author_facet | Meisner, Julianne Mooney, Stephen J. Rabinowitz, Peter M. |
author_sort | Meisner, Julianne |
collection | PubMed |
description | BACKGROUND: Pediatric diarrhea, a leading cause of under-five mortality, is predominantly infectious in etiology. As many putative causal agents are zoonotic, animal exposure is a likely risk factor. To evaluate the effect of animal-related factors on moderate to severe childhood diarrhea in rural Kenya, where animal contact is common, Conan et al. studied 73 matched case-control pairs from 2009-2011, collecting rich exposure data on many dimensions of animal contact. We review the challenges associated with analyzing moderately-sized datasets with a large number of predictors and present two alternative methodological approaches. METHODOLOGY/PRINCIPAL FINDINGS: We conducted a simulation study to demonstrate that forward stepwise selection results in overfit models when data are high-dimensional, and that p values reported directly from the data used to train these models are misleading. We described how automated methods of variable selection, attractive when the number of predictors is large, can result in overadjustment bias. We proposed an alternative a priori regression approach not subject to this bias. Applied to Conan et al.’s data, this approach found a non-significant but positive trend for household’s sharing of water sources with livestock or poultry, child’s presence for poultry slaughter, and child’s habit of playing where poultry sleep or defecate. For many predictors evaluated few pairs were discordant, suggesting matching compromised the power of this analysis. Finally, we proposed latent variable modeling as a complimentary approach and performed Item Response Theory modeling on Conan et al.’s data, with animal contact as the latent trait. We found a moderate but non-significant effect (OR 1.21, 95% CI 0.78, 1.87, unit = 1 standard deviation). CONCLUSIONS/SIGNIFICANCE: Automated methods of model selection are appropriate for prediction models when fit and evaluated on separate samples. However when the goal is inference, these methods can produce misleading results. Furthermore, case-control matching should be done with caution. |
format | Online Article Text |
id | pubmed-6485705 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-64857052019-05-09 The curse of dimensionality: Animal-related risk factors for pediatric diarrhea in western Kenya, and methods for dealing with a large number of predictors Meisner, Julianne Mooney, Stephen J. Rabinowitz, Peter M. PLoS One Research Article BACKGROUND: Pediatric diarrhea, a leading cause of under-five mortality, is predominantly infectious in etiology. As many putative causal agents are zoonotic, animal exposure is a likely risk factor. To evaluate the effect of animal-related factors on moderate to severe childhood diarrhea in rural Kenya, where animal contact is common, Conan et al. studied 73 matched case-control pairs from 2009-2011, collecting rich exposure data on many dimensions of animal contact. We review the challenges associated with analyzing moderately-sized datasets with a large number of predictors and present two alternative methodological approaches. METHODOLOGY/PRINCIPAL FINDINGS: We conducted a simulation study to demonstrate that forward stepwise selection results in overfit models when data are high-dimensional, and that p values reported directly from the data used to train these models are misleading. We described how automated methods of variable selection, attractive when the number of predictors is large, can result in overadjustment bias. We proposed an alternative a priori regression approach not subject to this bias. Applied to Conan et al.’s data, this approach found a non-significant but positive trend for household’s sharing of water sources with livestock or poultry, child’s presence for poultry slaughter, and child’s habit of playing where poultry sleep or defecate. For many predictors evaluated few pairs were discordant, suggesting matching compromised the power of this analysis. Finally, we proposed latent variable modeling as a complimentary approach and performed Item Response Theory modeling on Conan et al.’s data, with animal contact as the latent trait. We found a moderate but non-significant effect (OR 1.21, 95% CI 0.78, 1.87, unit = 1 standard deviation). CONCLUSIONS/SIGNIFICANCE: Automated methods of model selection are appropriate for prediction models when fit and evaluated on separate samples. However when the goal is inference, these methods can produce misleading results. Furthermore, case-control matching should be done with caution. Public Library of Science 2019-04-26 /pmc/articles/PMC6485705/ /pubmed/31026290 http://dx.doi.org/10.1371/journal.pone.0215982 Text en © 2019 Meisner et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Meisner, Julianne Mooney, Stephen J. Rabinowitz, Peter M. The curse of dimensionality: Animal-related risk factors for pediatric diarrhea in western Kenya, and methods for dealing with a large number of predictors |
title | The curse of dimensionality: Animal-related risk factors for pediatric diarrhea in western Kenya, and methods for dealing with a large number of predictors |
title_full | The curse of dimensionality: Animal-related risk factors for pediatric diarrhea in western Kenya, and methods for dealing with a large number of predictors |
title_fullStr | The curse of dimensionality: Animal-related risk factors for pediatric diarrhea in western Kenya, and methods for dealing with a large number of predictors |
title_full_unstemmed | The curse of dimensionality: Animal-related risk factors for pediatric diarrhea in western Kenya, and methods for dealing with a large number of predictors |
title_short | The curse of dimensionality: Animal-related risk factors for pediatric diarrhea in western Kenya, and methods for dealing with a large number of predictors |
title_sort | curse of dimensionality: animal-related risk factors for pediatric diarrhea in western kenya, and methods for dealing with a large number of predictors |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6485705/ https://www.ncbi.nlm.nih.gov/pubmed/31026290 http://dx.doi.org/10.1371/journal.pone.0215982 |
work_keys_str_mv | AT meisnerjulianne thecurseofdimensionalityanimalrelatedriskfactorsforpediatricdiarrheainwesternkenyaandmethodsfordealingwithalargenumberofpredictors AT mooneystephenj thecurseofdimensionalityanimalrelatedriskfactorsforpediatricdiarrheainwesternkenyaandmethodsfordealingwithalargenumberofpredictors AT rabinowitzpeterm thecurseofdimensionalityanimalrelatedriskfactorsforpediatricdiarrheainwesternkenyaandmethodsfordealingwithalargenumberofpredictors AT meisnerjulianne curseofdimensionalityanimalrelatedriskfactorsforpediatricdiarrheainwesternkenyaandmethodsfordealingwithalargenumberofpredictors AT mooneystephenj curseofdimensionalityanimalrelatedriskfactorsforpediatricdiarrheainwesternkenyaandmethodsfordealingwithalargenumberofpredictors AT rabinowitzpeterm curseofdimensionalityanimalrelatedriskfactorsforpediatricdiarrheainwesternkenyaandmethodsfordealingwithalargenumberofpredictors |