Cargando…

Validation of a Machine Learning Model to Predict Childhood Lead Poisoning

IMPORTANCE: Childhood lead poisoning causes irreversible neurobehavioral deficits, but current practice is secondary prevention. OBJECTIVE: To validate a machine learning (random forest) prediction model of elevated blood lead levels (EBLLs) by comparison with a parsimonious logistic regression. DES...

Descripción completa

Detalles Bibliográficos
Autores principales:	Potash, Eric, Ghani, Rayid, Walsh, Joe, Jorgensen, Emile, Lohff, Cortland, Prachand, Nik, Mansour, Raed
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	American Medical Association 2020
Materias:	Original Investigation
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7495240/ https://www.ncbi.nlm.nih.gov/pubmed/32936296 http://dx.doi.org/10.1001/jamanetworkopen.2020.12734

_version_	1783582898662670336
author	Potash, Eric Ghani, Rayid Walsh, Joe Jorgensen, Emile Lohff, Cortland Prachand, Nik Mansour, Raed
author_facet	Potash, Eric Ghani, Rayid Walsh, Joe Jorgensen, Emile Lohff, Cortland Prachand, Nik Mansour, Raed
author_sort	Potash, Eric
collection	PubMed
description	IMPORTANCE: Childhood lead poisoning causes irreversible neurobehavioral deficits, but current practice is secondary prevention. OBJECTIVE: To validate a machine learning (random forest) prediction model of elevated blood lead levels (EBLLs) by comparison with a parsimonious logistic regression. DESIGN, SETTING, AND PARTICIPANTS: This prognostic study for temporal validation of multivariable prediction models used data from the Women, Infants, and Children (WIC) program of the Chicago Department of Public Health. Participants included a development cohort of children born from January 1, 2007, to December 31, 2012, and a validation WIC cohort born from January 1 to December 31, 2013. Blood lead levels were measured until December 31, 2018. Data were analyzed from January 1 to October 31, 2019. EXPOSURES: Blood lead level test results; lead investigation findings; housing characteristics, permits, and violations; and demographic variables. MAIN OUTCOMES AND MEASURES: Incident EBLL (≥6 μg/dL). Models were assessed using the area under the receiver operating characteristic curve (AUC) and confusion matrix metrics (positive predictive value, sensitivity, and specificity) at various thresholds. RESULTS: Among 6812 children in the WIC validation cohort, 3451 (50.7%) were female, 3057 (44.9%) were Hispanic, 2804 (41.2%) were non-Hispanic Black, 458 (6.7%) were non-Hispanic White, and 442 (6.5%) were Asian (mean [SD] age, 5.5 [0.3] years). The median year of housing construction was 1919 (interquartile range, 1903-1948). Random forest AUC was 0.69 compared with 0.64 for logistic regression (difference, 0.05; 95% CI, 0.02-0.08). When predicting the 5% of children at highest risk to have EBLLs, random forest and logistic regression models had positive predictive values of 15.5% and 7.8%, respectively (difference, 7.7%; 95% CI, 3.7%-11.3%), sensitivity of 16.2% and 8.1%, respectively (difference, 8.1%; 95% CI, 3.9%-11.7%), and specificity of 95.5% and 95.1% (difference, 0.4%; 95% CI, 0.0%-0.7%). CONCLUSIONS AND RELEVANCE: The machine learning model outperformed regression in predicting childhood lead poisoning, especially in identifying children at highest risk. Such a model could be used to target the allocation of lead poisoning prevention resources to these children.
format	Online Article Text
id	pubmed-7495240
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	American Medical Association
record_format	MEDLINE/PubMed
spelling	pubmed-74952402020-09-25 Validation of a Machine Learning Model to Predict Childhood Lead Poisoning Potash, Eric Ghani, Rayid Walsh, Joe Jorgensen, Emile Lohff, Cortland Prachand, Nik Mansour, Raed JAMA Netw Open Original Investigation IMPORTANCE: Childhood lead poisoning causes irreversible neurobehavioral deficits, but current practice is secondary prevention. OBJECTIVE: To validate a machine learning (random forest) prediction model of elevated blood lead levels (EBLLs) by comparison with a parsimonious logistic regression. DESIGN, SETTING, AND PARTICIPANTS: This prognostic study for temporal validation of multivariable prediction models used data from the Women, Infants, and Children (WIC) program of the Chicago Department of Public Health. Participants included a development cohort of children born from January 1, 2007, to December 31, 2012, and a validation WIC cohort born from January 1 to December 31, 2013. Blood lead levels were measured until December 31, 2018. Data were analyzed from January 1 to October 31, 2019. EXPOSURES: Blood lead level test results; lead investigation findings; housing characteristics, permits, and violations; and demographic variables. MAIN OUTCOMES AND MEASURES: Incident EBLL (≥6 μg/dL). Models were assessed using the area under the receiver operating characteristic curve (AUC) and confusion matrix metrics (positive predictive value, sensitivity, and specificity) at various thresholds. RESULTS: Among 6812 children in the WIC validation cohort, 3451 (50.7%) were female, 3057 (44.9%) were Hispanic, 2804 (41.2%) were non-Hispanic Black, 458 (6.7%) were non-Hispanic White, and 442 (6.5%) were Asian (mean [SD] age, 5.5 [0.3] years). The median year of housing construction was 1919 (interquartile range, 1903-1948). Random forest AUC was 0.69 compared with 0.64 for logistic regression (difference, 0.05; 95% CI, 0.02-0.08). When predicting the 5% of children at highest risk to have EBLLs, random forest and logistic regression models had positive predictive values of 15.5% and 7.8%, respectively (difference, 7.7%; 95% CI, 3.7%-11.3%), sensitivity of 16.2% and 8.1%, respectively (difference, 8.1%; 95% CI, 3.9%-11.7%), and specificity of 95.5% and 95.1% (difference, 0.4%; 95% CI, 0.0%-0.7%). CONCLUSIONS AND RELEVANCE: The machine learning model outperformed regression in predicting childhood lead poisoning, especially in identifying children at highest risk. Such a model could be used to target the allocation of lead poisoning prevention resources to these children. American Medical Association 2020-09-16 /pmc/articles/PMC7495240/ /pubmed/32936296 http://dx.doi.org/10.1001/jamanetworkopen.2020.12734 Text en Copyright 2020 Potash E et al. JAMA Network Open. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the CC-BY License.
spellingShingle	Original Investigation Potash, Eric Ghani, Rayid Walsh, Joe Jorgensen, Emile Lohff, Cortland Prachand, Nik Mansour, Raed Validation of a Machine Learning Model to Predict Childhood Lead Poisoning
title	Validation of a Machine Learning Model to Predict Childhood Lead Poisoning
title_full	Validation of a Machine Learning Model to Predict Childhood Lead Poisoning
title_fullStr	Validation of a Machine Learning Model to Predict Childhood Lead Poisoning
title_full_unstemmed	Validation of a Machine Learning Model to Predict Childhood Lead Poisoning
title_short	Validation of a Machine Learning Model to Predict Childhood Lead Poisoning
title_sort	validation of a machine learning model to predict childhood lead poisoning
topic	Original Investigation
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7495240/ https://www.ncbi.nlm.nih.gov/pubmed/32936296 http://dx.doi.org/10.1001/jamanetworkopen.2020.12734
work_keys_str_mv	AT potasheric validationofamachinelearningmodeltopredictchildhoodleadpoisoning AT ghanirayid validationofamachinelearningmodeltopredictchildhoodleadpoisoning AT walshjoe validationofamachinelearningmodeltopredictchildhoodleadpoisoning AT jorgensenemile validationofamachinelearningmodeltopredictchildhoodleadpoisoning AT lohffcortland validationofamachinelearningmodeltopredictchildhoodleadpoisoning AT prachandnik validationofamachinelearningmodeltopredictchildhoodleadpoisoning AT mansourraed validationofamachinelearningmodeltopredictchildhoodleadpoisoning

Validation of a Machine Learning Model to Predict Childhood Lead Poisoning

Ejemplares similares