Cargando…

Propensity score adjustment using machine learning classification algorithms to control selection bias in online surveys

Modern survey methods may be subject to non-observable bias, from various sources. Among online surveys, for example, selection bias is prevalent, due to the sampling mechanism commonly used, whereby participants self-select from a subgroup whose characteristics differ from those of the target popul...

Descripción completa

Detalles Bibliográficos
Autores principales: Ferri-García, Ramón, Rueda, María del Mar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7176094/
https://www.ncbi.nlm.nih.gov/pubmed/32320429
http://dx.doi.org/10.1371/journal.pone.0231500
_version_ 1783524951378100224
author Ferri-García, Ramón
Rueda, María del Mar
author_facet Ferri-García, Ramón
Rueda, María del Mar
author_sort Ferri-García, Ramón
collection PubMed
description Modern survey methods may be subject to non-observable bias, from various sources. Among online surveys, for example, selection bias is prevalent, due to the sampling mechanism commonly used, whereby participants self-select from a subgroup whose characteristics differ from those of the target population. Several techniques have been proposed to tackle this issue. One such is Propensity Score Adjustment (PSA), which is widely used and has been analysed in various studies. The usual method of estimating the propensity score is logistic regression, which requires a reference probability sample in addition to the online nonprobability sample. The predicted propensities can be used for reweighting using various estimators. However, in the online survey context, there are alternatives that might outperform logistic regression regarding propensity estimation. The aim of the present study is to determine the efficiency of some of these alternatives, involving Machine Learning (ML) classification algorithms. PSA is applied in two simulation scenarios, representing situations commonly found in online surveys, using logistic regression and ML models for propensity estimation. The results obtained show that ML algorithms remove selection bias more effectively than logistic regression when used for PSA, but that their efficacy depends largely on the selection mechanism employed and the dimensionality of the data.
format Online
Article
Text
id pubmed-7176094
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-71760942020-04-27 Propensity score adjustment using machine learning classification algorithms to control selection bias in online surveys Ferri-García, Ramón Rueda, María del Mar PLoS One Research Article Modern survey methods may be subject to non-observable bias, from various sources. Among online surveys, for example, selection bias is prevalent, due to the sampling mechanism commonly used, whereby participants self-select from a subgroup whose characteristics differ from those of the target population. Several techniques have been proposed to tackle this issue. One such is Propensity Score Adjustment (PSA), which is widely used and has been analysed in various studies. The usual method of estimating the propensity score is logistic regression, which requires a reference probability sample in addition to the online nonprobability sample. The predicted propensities can be used for reweighting using various estimators. However, in the online survey context, there are alternatives that might outperform logistic regression regarding propensity estimation. The aim of the present study is to determine the efficiency of some of these alternatives, involving Machine Learning (ML) classification algorithms. PSA is applied in two simulation scenarios, representing situations commonly found in online surveys, using logistic regression and ML models for propensity estimation. The results obtained show that ML algorithms remove selection bias more effectively than logistic regression when used for PSA, but that their efficacy depends largely on the selection mechanism employed and the dimensionality of the data. Public Library of Science 2020-04-22 /pmc/articles/PMC7176094/ /pubmed/32320429 http://dx.doi.org/10.1371/journal.pone.0231500 Text en © 2020 Ferri-García, Rueda http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Ferri-García, Ramón
Rueda, María del Mar
Propensity score adjustment using machine learning classification algorithms to control selection bias in online surveys
title Propensity score adjustment using machine learning classification algorithms to control selection bias in online surveys
title_full Propensity score adjustment using machine learning classification algorithms to control selection bias in online surveys
title_fullStr Propensity score adjustment using machine learning classification algorithms to control selection bias in online surveys
title_full_unstemmed Propensity score adjustment using machine learning classification algorithms to control selection bias in online surveys
title_short Propensity score adjustment using machine learning classification algorithms to control selection bias in online surveys
title_sort propensity score adjustment using machine learning classification algorithms to control selection bias in online surveys
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7176094/
https://www.ncbi.nlm.nih.gov/pubmed/32320429
http://dx.doi.org/10.1371/journal.pone.0231500
work_keys_str_mv AT ferrigarciaramon propensityscoreadjustmentusingmachinelearningclassificationalgorithmstocontrolselectionbiasinonlinesurveys
AT ruedamariadelmar propensityscoreadjustmentusingmachinelearningclassificationalgorithmstocontrolselectionbiasinonlinesurveys