Cargando…
Using machine learning to impute legal status of immigrants in the National Health Interview Survey
We describe a novel machine learning method of imputing legal status for immigrants using nationally representative survey data from the Survey of Income and Program Participation (SIPP) and the National Health Interview Survey (NHIS). K-nearest Neighbor (KNN) classifier and Random Forest (RF) Algor...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9490167/ https://www.ncbi.nlm.nih.gov/pubmed/36160111 http://dx.doi.org/10.1016/j.mex.2022.101848 |
_version_ | 1784793029007114240 |
---|---|
author | Ruhnke, Simon A. Wilson, Fernando A. Stimpson, Jim P. |
author_facet | Ruhnke, Simon A. Wilson, Fernando A. Stimpson, Jim P. |
author_sort | Ruhnke, Simon A. |
collection | PubMed |
description | We describe a novel machine learning method of imputing legal status for immigrants using nationally representative survey data from the Survey of Income and Program Participation (SIPP) and the National Health Interview Survey (NHIS). K-nearest Neighbor (KNN) classifier and Random Forest (RF) Algorithm machine learning were described as novel imputation methods compared to established regression-based imputation. After validating the imputation methods using sensitivity, specificity, positive predictive value (PPV) and accuracy statistics, the Random Forest Algorithm was more accurate in identifying undocumented immigrants and minimized bias in both socio-demographic variables included in the imputation, and unobserved health variables relative to regression-based imputation and KNN. • We developed a new machine learning method of imputing legal status for immigrants that can be used with nationally representative, publicly available data. • Our findings indicate that using machine learning to impute legal status of immigrants, specifically the Random Forest Algorithm, was more accurate in identifying undocumented immigrants and minimized bias relative to other imputation methods. |
format | Online Article Text |
id | pubmed-9490167 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-94901672022-09-22 Using machine learning to impute legal status of immigrants in the National Health Interview Survey Ruhnke, Simon A. Wilson, Fernando A. Stimpson, Jim P. MethodsX Method Article We describe a novel machine learning method of imputing legal status for immigrants using nationally representative survey data from the Survey of Income and Program Participation (SIPP) and the National Health Interview Survey (NHIS). K-nearest Neighbor (KNN) classifier and Random Forest (RF) Algorithm machine learning were described as novel imputation methods compared to established regression-based imputation. After validating the imputation methods using sensitivity, specificity, positive predictive value (PPV) and accuracy statistics, the Random Forest Algorithm was more accurate in identifying undocumented immigrants and minimized bias in both socio-demographic variables included in the imputation, and unobserved health variables relative to regression-based imputation and KNN. • We developed a new machine learning method of imputing legal status for immigrants that can be used with nationally representative, publicly available data. • Our findings indicate that using machine learning to impute legal status of immigrants, specifically the Random Forest Algorithm, was more accurate in identifying undocumented immigrants and minimized bias relative to other imputation methods. Elsevier 2022-09-08 /pmc/articles/PMC9490167/ /pubmed/36160111 http://dx.doi.org/10.1016/j.mex.2022.101848 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Method Article Ruhnke, Simon A. Wilson, Fernando A. Stimpson, Jim P. Using machine learning to impute legal status of immigrants in the National Health Interview Survey |
title | Using machine learning to impute legal status of immigrants in the National Health Interview Survey |
title_full | Using machine learning to impute legal status of immigrants in the National Health Interview Survey |
title_fullStr | Using machine learning to impute legal status of immigrants in the National Health Interview Survey |
title_full_unstemmed | Using machine learning to impute legal status of immigrants in the National Health Interview Survey |
title_short | Using machine learning to impute legal status of immigrants in the National Health Interview Survey |
title_sort | using machine learning to impute legal status of immigrants in the national health interview survey |
topic | Method Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9490167/ https://www.ncbi.nlm.nih.gov/pubmed/36160111 http://dx.doi.org/10.1016/j.mex.2022.101848 |
work_keys_str_mv | AT ruhnkesimona usingmachinelearningtoimputelegalstatusofimmigrantsinthenationalhealthinterviewsurvey AT wilsonfernandoa usingmachinelearningtoimputelegalstatusofimmigrantsinthenationalhealthinterviewsurvey AT stimpsonjimp usingmachinelearningtoimputelegalstatusofimmigrantsinthenationalhealthinterviewsurvey |