Cargando…

Enhancing estimation methods for integrating probability and nonprobability survey samples with machine‐learning techniques. An application to a Survey on the impact of the COVID‐19 pandemic in Spain

Web surveys have replaced Face‐to‐Face and computer assisted telephone interviewing (CATI) as the main mode of data collection in most countries. This trend was reinforced as a consequence of COVID‐19 pandemic‐related restrictions. However, this mode still faces significant limitations in obtaining...

Descripción completa

Detalles Bibliográficos
Autores principales: Rueda, María del Mar, Pasadas‐del‐Amo, Sara, Rodríguez, Beatriz Cobo, Castro‐Martín, Luis, Ferri‐García, Ramón
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9538074/
https://www.ncbi.nlm.nih.gov/pubmed/36136044
http://dx.doi.org/10.1002/bimj.202200035
_version_ 1784803311862415360
author Rueda, María del Mar
Pasadas‐del‐Amo, Sara
Rodríguez, Beatriz Cobo
Castro‐Martín, Luis
Ferri‐García, Ramón
author_facet Rueda, María del Mar
Pasadas‐del‐Amo, Sara
Rodríguez, Beatriz Cobo
Castro‐Martín, Luis
Ferri‐García, Ramón
author_sort Rueda, María del Mar
collection PubMed
description Web surveys have replaced Face‐to‐Face and computer assisted telephone interviewing (CATI) as the main mode of data collection in most countries. This trend was reinforced as a consequence of COVID‐19 pandemic‐related restrictions. However, this mode still faces significant limitations in obtaining probability‐based samples of the general population. For this reason, most web surveys rely on nonprobability survey designs. Whereas probability‐based designs continue to be the gold standard in survey sampling, nonprobability web surveys may still prove useful in some situations. For instance, when small subpopulations are the group under study and probability sampling is unlikely to meet sample size requirements, complementing a small probability sample with a larger nonprobability one may improve the efficiency of the estimates. Nonprobability samples may also be designed as a mean for compensating for known biases in probability‐based web survey samples by purposely targeting respondent profiles that tend to be underrepresented in these surveys. This is the case in the Survey on the impact of the COVID‐19 pandemic in Spain (ESPACOV) that motivates this paper. In this paper, we propose a methodology for combining probability and nonprobability web‐based survey samples with the help of machine‐learning techniques. We then assess the efficiency of the resulting estimates by comparing them with other strategies that have been used before. Our simulation study and the application of the proposed estimation method to the second wave of the ESPACOV Survey allow us to conclude that this is the best option for reducing the biases observed in our data.
format Online
Article
Text
id pubmed-9538074
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-95380742022-10-11 Enhancing estimation methods for integrating probability and nonprobability survey samples with machine‐learning techniques. An application to a Survey on the impact of the COVID‐19 pandemic in Spain Rueda, María del Mar Pasadas‐del‐Amo, Sara Rodríguez, Beatriz Cobo Castro‐Martín, Luis Ferri‐García, Ramón Biom J Research Articles Web surveys have replaced Face‐to‐Face and computer assisted telephone interviewing (CATI) as the main mode of data collection in most countries. This trend was reinforced as a consequence of COVID‐19 pandemic‐related restrictions. However, this mode still faces significant limitations in obtaining probability‐based samples of the general population. For this reason, most web surveys rely on nonprobability survey designs. Whereas probability‐based designs continue to be the gold standard in survey sampling, nonprobability web surveys may still prove useful in some situations. For instance, when small subpopulations are the group under study and probability sampling is unlikely to meet sample size requirements, complementing a small probability sample with a larger nonprobability one may improve the efficiency of the estimates. Nonprobability samples may also be designed as a mean for compensating for known biases in probability‐based web survey samples by purposely targeting respondent profiles that tend to be underrepresented in these surveys. This is the case in the Survey on the impact of the COVID‐19 pandemic in Spain (ESPACOV) that motivates this paper. In this paper, we propose a methodology for combining probability and nonprobability web‐based survey samples with the help of machine‐learning techniques. We then assess the efficiency of the resulting estimates by comparing them with other strategies that have been used before. Our simulation study and the application of the proposed estimation method to the second wave of the ESPACOV Survey allow us to conclude that this is the best option for reducing the biases observed in our data. John Wiley and Sons Inc. 2022-09-22 /pmc/articles/PMC9538074/ /pubmed/36136044 http://dx.doi.org/10.1002/bimj.202200035 Text en © 2022 The Authors. Biometrical Journal published by Wiley‐VCH GmbH. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc-nd/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.
spellingShingle Research Articles
Rueda, María del Mar
Pasadas‐del‐Amo, Sara
Rodríguez, Beatriz Cobo
Castro‐Martín, Luis
Ferri‐García, Ramón
Enhancing estimation methods for integrating probability and nonprobability survey samples with machine‐learning techniques. An application to a Survey on the impact of the COVID‐19 pandemic in Spain
title Enhancing estimation methods for integrating probability and nonprobability survey samples with machine‐learning techniques. An application to a Survey on the impact of the COVID‐19 pandemic in Spain
title_full Enhancing estimation methods for integrating probability and nonprobability survey samples with machine‐learning techniques. An application to a Survey on the impact of the COVID‐19 pandemic in Spain
title_fullStr Enhancing estimation methods for integrating probability and nonprobability survey samples with machine‐learning techniques. An application to a Survey on the impact of the COVID‐19 pandemic in Spain
title_full_unstemmed Enhancing estimation methods for integrating probability and nonprobability survey samples with machine‐learning techniques. An application to a Survey on the impact of the COVID‐19 pandemic in Spain
title_short Enhancing estimation methods for integrating probability and nonprobability survey samples with machine‐learning techniques. An application to a Survey on the impact of the COVID‐19 pandemic in Spain
title_sort enhancing estimation methods for integrating probability and nonprobability survey samples with machine‐learning techniques. an application to a survey on the impact of the covid‐19 pandemic in spain
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9538074/
https://www.ncbi.nlm.nih.gov/pubmed/36136044
http://dx.doi.org/10.1002/bimj.202200035
work_keys_str_mv AT ruedamariadelmar enhancingestimationmethodsforintegratingprobabilityandnonprobabilitysurveysampleswithmachinelearningtechniquesanapplicationtoasurveyontheimpactofthecovid19pandemicinspain
AT pasadasdelamosara enhancingestimationmethodsforintegratingprobabilityandnonprobabilitysurveysampleswithmachinelearningtechniquesanapplicationtoasurveyontheimpactofthecovid19pandemicinspain
AT rodriguezbeatrizcobo enhancingestimationmethodsforintegratingprobabilityandnonprobabilitysurveysampleswithmachinelearningtechniquesanapplicationtoasurveyontheimpactofthecovid19pandemicinspain
AT castromartinluis enhancingestimationmethodsforintegratingprobabilityandnonprobabilitysurveysampleswithmachinelearningtechniquesanapplicationtoasurveyontheimpactofthecovid19pandemicinspain
AT ferrigarciaramon enhancingestimationmethodsforintegratingprobabilityandnonprobabilitysurveysampleswithmachinelearningtechniquesanapplicationtoasurveyontheimpactofthecovid19pandemicinspain