Cargando…

Using machine learning to impute legal status of immigrants in the National Health Interview Survey

We describe a novel machine learning method of imputing legal status for immigrants using nationally representative survey data from the Survey of Income and Program Participation (SIPP) and the National Health Interview Survey (NHIS). K-nearest Neighbor (KNN) classifier and Random Forest (RF) Algor...

Descripción completa

Detalles Bibliográficos
Autores principales: Ruhnke, Simon A., Wilson, Fernando A., Stimpson, Jim P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9490167/
https://www.ncbi.nlm.nih.gov/pubmed/36160111
http://dx.doi.org/10.1016/j.mex.2022.101848
_version_ 1784793029007114240
author Ruhnke, Simon A.
Wilson, Fernando A.
Stimpson, Jim P.
author_facet Ruhnke, Simon A.
Wilson, Fernando A.
Stimpson, Jim P.
author_sort Ruhnke, Simon A.
collection PubMed
description We describe a novel machine learning method of imputing legal status for immigrants using nationally representative survey data from the Survey of Income and Program Participation (SIPP) and the National Health Interview Survey (NHIS). K-nearest Neighbor (KNN) classifier and Random Forest (RF) Algorithm machine learning were described as novel imputation methods compared to established regression-based imputation. After validating the imputation methods using sensitivity, specificity, positive predictive value (PPV) and accuracy statistics, the Random Forest Algorithm was more accurate in identifying undocumented immigrants and minimized bias in both socio-demographic variables included in the imputation, and unobserved health variables relative to regression-based imputation and KNN. • We developed a new machine learning method of imputing legal status for immigrants that can be used with nationally representative, publicly available data. • Our findings indicate that using machine learning to impute legal status of immigrants, specifically the Random Forest Algorithm, was more accurate in identifying undocumented immigrants and minimized bias relative to other imputation methods.
format Online
Article
Text
id pubmed-9490167
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-94901672022-09-22 Using machine learning to impute legal status of immigrants in the National Health Interview Survey Ruhnke, Simon A. Wilson, Fernando A. Stimpson, Jim P. MethodsX Method Article We describe a novel machine learning method of imputing legal status for immigrants using nationally representative survey data from the Survey of Income and Program Participation (SIPP) and the National Health Interview Survey (NHIS). K-nearest Neighbor (KNN) classifier and Random Forest (RF) Algorithm machine learning were described as novel imputation methods compared to established regression-based imputation. After validating the imputation methods using sensitivity, specificity, positive predictive value (PPV) and accuracy statistics, the Random Forest Algorithm was more accurate in identifying undocumented immigrants and minimized bias in both socio-demographic variables included in the imputation, and unobserved health variables relative to regression-based imputation and KNN. • We developed a new machine learning method of imputing legal status for immigrants that can be used with nationally representative, publicly available data. • Our findings indicate that using machine learning to impute legal status of immigrants, specifically the Random Forest Algorithm, was more accurate in identifying undocumented immigrants and minimized bias relative to other imputation methods. Elsevier 2022-09-08 /pmc/articles/PMC9490167/ /pubmed/36160111 http://dx.doi.org/10.1016/j.mex.2022.101848 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Method Article
Ruhnke, Simon A.
Wilson, Fernando A.
Stimpson, Jim P.
Using machine learning to impute legal status of immigrants in the National Health Interview Survey
title Using machine learning to impute legal status of immigrants in the National Health Interview Survey
title_full Using machine learning to impute legal status of immigrants in the National Health Interview Survey
title_fullStr Using machine learning to impute legal status of immigrants in the National Health Interview Survey
title_full_unstemmed Using machine learning to impute legal status of immigrants in the National Health Interview Survey
title_short Using machine learning to impute legal status of immigrants in the National Health Interview Survey
title_sort using machine learning to impute legal status of immigrants in the national health interview survey
topic Method Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9490167/
https://www.ncbi.nlm.nih.gov/pubmed/36160111
http://dx.doi.org/10.1016/j.mex.2022.101848
work_keys_str_mv AT ruhnkesimona usingmachinelearningtoimputelegalstatusofimmigrantsinthenationalhealthinterviewsurvey
AT wilsonfernandoa usingmachinelearningtoimputelegalstatusofimmigrantsinthenationalhealthinterviewsurvey
AT stimpsonjimp usingmachinelearningtoimputelegalstatusofimmigrantsinthenationalhealthinterviewsurvey