Cargando…

Reweighting in panel surveys: machine learning techniques for the Health Care and Social Survey

BACKGROUND: Healthcare statistical services have used probability surveys to respond to such information needs. The Health Care and Social Survey (ESSOC) research project arises from the need to provide data on the evolution of the COVID-19 impact. This survey has an overlapping panel survey design...

Descripción completa

Detalles Bibliográficos
Autores principales: Cabrera-Léon, A, Castro-Martín, L, Rueda, M, Sánchez-Cantalejo, C, Ferri-García, R
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10595118/
http://dx.doi.org/10.1093/eurpub/ckad160.1696
_version_ 1785124792846778368
author Cabrera-Léon, A
Castro-Martín, L
Rueda, M
Sánchez-Cantalejo, C
Ferri-García, R
author_facet Cabrera-Léon, A
Castro-Martín, L
Rueda, M
Sánchez-Cantalejo, C
Ferri-García, R
author_sort Cabrera-Léon, A
collection PubMed
description BACKGROUND: Healthcare statistical services have used probability surveys to respond to such information needs. The Health Care and Social Survey (ESSOC) research project arises from the need to provide data on the evolution of the COVID-19 impact. This survey has an overlapping panel survey design with 4 measurements throughout 1 year with random samples. OBJECTIVE: To develop a new reweighting method for overlapping panel surveys affected by non-response. METHODS: Each ESSOC measurement is composed of two samples: a longitudinal sample from previous measurements and a new sample at each measurement. In each measurement, missing units are substituted by new surveyed units, allowing the obtention of cross-sectional and longitudinal estimates. The advantage of this design is that, in addition to being able to obtain longitudinal estimates, cross-sectional estimates are more accurate because of the larger sample size. However, the problem of non-response is particularly aggravated in the case of panel surveys, due to the fatigue of the population to be repeatedly surveyed. RESULTS: Considering the design, timing and objectives of this survey, our reweighting method produces suitable estimators for both cross-sectional and longitudinal samples. The weights are the result of a two-step process: the original sampling design weights are corrected during a 1st phase by modelling the non-response with respect to the longitudinal sample obtained in a previous measurement using machine learning techniques. Then, during a 2nd phase, they are calibrated using the auxiliary information available at the population level. The proposed method is applied to the estimation of totals, proportions, ratios, and differences between measurements as well as gender gaps. CONCLUSIONS: For addressing future health crises like COVID-19, it is therefore necessary to reduce potential coverage and non-response biases in surveys by means of utilising reweighting techniques as proposed in this study. KEY MESSAGES: • It is therefore necessary to reduce potential coverage and non-response biases in surveys. • This study proposes the utilisation of reweighting techniques based on calibration and XGBoost.
format Online
Article
Text
id pubmed-10595118
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-105951182023-10-25 Reweighting in panel surveys: machine learning techniques for the Health Care and Social Survey Cabrera-Léon, A Castro-Martín, L Rueda, M Sánchez-Cantalejo, C Ferri-García, R Eur J Public Health Poster Displays BACKGROUND: Healthcare statistical services have used probability surveys to respond to such information needs. The Health Care and Social Survey (ESSOC) research project arises from the need to provide data on the evolution of the COVID-19 impact. This survey has an overlapping panel survey design with 4 measurements throughout 1 year with random samples. OBJECTIVE: To develop a new reweighting method for overlapping panel surveys affected by non-response. METHODS: Each ESSOC measurement is composed of two samples: a longitudinal sample from previous measurements and a new sample at each measurement. In each measurement, missing units are substituted by new surveyed units, allowing the obtention of cross-sectional and longitudinal estimates. The advantage of this design is that, in addition to being able to obtain longitudinal estimates, cross-sectional estimates are more accurate because of the larger sample size. However, the problem of non-response is particularly aggravated in the case of panel surveys, due to the fatigue of the population to be repeatedly surveyed. RESULTS: Considering the design, timing and objectives of this survey, our reweighting method produces suitable estimators for both cross-sectional and longitudinal samples. The weights are the result of a two-step process: the original sampling design weights are corrected during a 1st phase by modelling the non-response with respect to the longitudinal sample obtained in a previous measurement using machine learning techniques. Then, during a 2nd phase, they are calibrated using the auxiliary information available at the population level. The proposed method is applied to the estimation of totals, proportions, ratios, and differences between measurements as well as gender gaps. CONCLUSIONS: For addressing future health crises like COVID-19, it is therefore necessary to reduce potential coverage and non-response biases in surveys by means of utilising reweighting techniques as proposed in this study. KEY MESSAGES: • It is therefore necessary to reduce potential coverage and non-response biases in surveys. • This study proposes the utilisation of reweighting techniques based on calibration and XGBoost. Oxford University Press 2023-10-24 /pmc/articles/PMC10595118/ http://dx.doi.org/10.1093/eurpub/ckad160.1696 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of the European Public Health Association. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Poster Displays
Cabrera-Léon, A
Castro-Martín, L
Rueda, M
Sánchez-Cantalejo, C
Ferri-García, R
Reweighting in panel surveys: machine learning techniques for the Health Care and Social Survey
title Reweighting in panel surveys: machine learning techniques for the Health Care and Social Survey
title_full Reweighting in panel surveys: machine learning techniques for the Health Care and Social Survey
title_fullStr Reweighting in panel surveys: machine learning techniques for the Health Care and Social Survey
title_full_unstemmed Reweighting in panel surveys: machine learning techniques for the Health Care and Social Survey
title_short Reweighting in panel surveys: machine learning techniques for the Health Care and Social Survey
title_sort reweighting in panel surveys: machine learning techniques for the health care and social survey
topic Poster Displays
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10595118/
http://dx.doi.org/10.1093/eurpub/ckad160.1696
work_keys_str_mv AT cabreraleona reweightinginpanelsurveysmachinelearningtechniquesforthehealthcareandsocialsurvey
AT castromartinl reweightinginpanelsurveysmachinelearningtechniquesforthehealthcareandsocialsurvey
AT ruedam reweightinginpanelsurveysmachinelearningtechniquesforthehealthcareandsocialsurvey
AT sanchezcantalejoc reweightinginpanelsurveysmachinelearningtechniquesforthehealthcareandsocialsurvey
AT ferrigarciar reweightinginpanelsurveysmachinelearningtechniquesforthehealthcareandsocialsurvey