Cargando…
Propensity score adjustment using machine learning classification algorithms to control selection bias in online surveys
Modern survey methods may be subject to non-observable bias, from various sources. Among online surveys, for example, selection bias is prevalent, due to the sampling mechanism commonly used, whereby participants self-select from a subgroup whose characteristics differ from those of the target popul...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7176094/ https://www.ncbi.nlm.nih.gov/pubmed/32320429 http://dx.doi.org/10.1371/journal.pone.0231500 |
_version_ | 1783524951378100224 |
---|---|
author | Ferri-García, Ramón Rueda, María del Mar |
author_facet | Ferri-García, Ramón Rueda, María del Mar |
author_sort | Ferri-García, Ramón |
collection | PubMed |
description | Modern survey methods may be subject to non-observable bias, from various sources. Among online surveys, for example, selection bias is prevalent, due to the sampling mechanism commonly used, whereby participants self-select from a subgroup whose characteristics differ from those of the target population. Several techniques have been proposed to tackle this issue. One such is Propensity Score Adjustment (PSA), which is widely used and has been analysed in various studies. The usual method of estimating the propensity score is logistic regression, which requires a reference probability sample in addition to the online nonprobability sample. The predicted propensities can be used for reweighting using various estimators. However, in the online survey context, there are alternatives that might outperform logistic regression regarding propensity estimation. The aim of the present study is to determine the efficiency of some of these alternatives, involving Machine Learning (ML) classification algorithms. PSA is applied in two simulation scenarios, representing situations commonly found in online surveys, using logistic regression and ML models for propensity estimation. The results obtained show that ML algorithms remove selection bias more effectively than logistic regression when used for PSA, but that their efficacy depends largely on the selection mechanism employed and the dimensionality of the data. |
format | Online Article Text |
id | pubmed-7176094 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-71760942020-04-27 Propensity score adjustment using machine learning classification algorithms to control selection bias in online surveys Ferri-García, Ramón Rueda, María del Mar PLoS One Research Article Modern survey methods may be subject to non-observable bias, from various sources. Among online surveys, for example, selection bias is prevalent, due to the sampling mechanism commonly used, whereby participants self-select from a subgroup whose characteristics differ from those of the target population. Several techniques have been proposed to tackle this issue. One such is Propensity Score Adjustment (PSA), which is widely used and has been analysed in various studies. The usual method of estimating the propensity score is logistic regression, which requires a reference probability sample in addition to the online nonprobability sample. The predicted propensities can be used for reweighting using various estimators. However, in the online survey context, there are alternatives that might outperform logistic regression regarding propensity estimation. The aim of the present study is to determine the efficiency of some of these alternatives, involving Machine Learning (ML) classification algorithms. PSA is applied in two simulation scenarios, representing situations commonly found in online surveys, using logistic regression and ML models for propensity estimation. The results obtained show that ML algorithms remove selection bias more effectively than logistic regression when used for PSA, but that their efficacy depends largely on the selection mechanism employed and the dimensionality of the data. Public Library of Science 2020-04-22 /pmc/articles/PMC7176094/ /pubmed/32320429 http://dx.doi.org/10.1371/journal.pone.0231500 Text en © 2020 Ferri-García, Rueda http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Ferri-García, Ramón Rueda, María del Mar Propensity score adjustment using machine learning classification algorithms to control selection bias in online surveys |
title | Propensity score adjustment using machine learning classification algorithms to control selection bias in online surveys |
title_full | Propensity score adjustment using machine learning classification algorithms to control selection bias in online surveys |
title_fullStr | Propensity score adjustment using machine learning classification algorithms to control selection bias in online surveys |
title_full_unstemmed | Propensity score adjustment using machine learning classification algorithms to control selection bias in online surveys |
title_short | Propensity score adjustment using machine learning classification algorithms to control selection bias in online surveys |
title_sort | propensity score adjustment using machine learning classification algorithms to control selection bias in online surveys |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7176094/ https://www.ncbi.nlm.nih.gov/pubmed/32320429 http://dx.doi.org/10.1371/journal.pone.0231500 |
work_keys_str_mv | AT ferrigarciaramon propensityscoreadjustmentusingmachinelearningclassificationalgorithmstocontrolselectionbiasinonlinesurveys AT ruedamariadelmar propensityscoreadjustmentusingmachinelearningclassificationalgorithmstocontrolselectionbiasinonlinesurveys |