Cargando…

Polling India via regression and post-stratification of non-probability online samples

Recent technological advances have facilitated the collection of large-scale administrative data and the online surveying of the Indian population. Building on these we propose a strategy for more robust, frequent and transparent projections of the Indian vote during the campaign. We execute a modif...

Descripción completa

Detalles Bibliográficos
Autores principales:	Cerina, Roberto, Duch, Raymond
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2021
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8629219/ https://www.ncbi.nlm.nih.gov/pubmed/34843519 http://dx.doi.org/10.1371/journal.pone.0260092

_version_	1784607157987049472
author	Cerina, Roberto Duch, Raymond
author_facet	Cerina, Roberto Duch, Raymond
author_sort	Cerina, Roberto
collection	PubMed
description	Recent technological advances have facilitated the collection of large-scale administrative data and the online surveying of the Indian population. Building on these we propose a strategy for more robust, frequent and transparent projections of the Indian vote during the campaign. We execute a modified MrP model of Indian vote preferences that proposes innovations to each of its three core components: stratification frame, training data, and a learner. For the post-stratification frame we propose a novel Data Integration approach that allows the simultaneous estimation of counts from multiple complementary sources, such as census tables and auxiliary surveys. For the training data we assemble panels of respondents from two unorthodox online populations: Amazon Mechanical Turks workers and Facebook users. And as a modeling tool, we replace the Bayesian multilevel regression learner with Random Forests. Our 2019 pre-election forecasts for the two largest Lok Sahba coalitions were very close to actual outcomes: we predicted 41.8% for the NDA, against an observed value of 45.0% and 30.8% for the UPA against an observed vote share of just under 31.3%. Our uniform-swing seat projection outperforms other pollsters—we had the lowest absolute error of 89 seats (along with a poll from ‘Jan Ki Baat’); the lowest error on the NDA-UPA lead (a mere 8 seats), and we are the only pollster that can capture real-time preference shifts due to salient campaign events.
format	Online Article Text
id	pubmed-8629219
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-86292192021-11-30 Polling India via regression and post-stratification of non-probability online samples Cerina, Roberto Duch, Raymond PLoS One Research Article Recent technological advances have facilitated the collection of large-scale administrative data and the online surveying of the Indian population. Building on these we propose a strategy for more robust, frequent and transparent projections of the Indian vote during the campaign. We execute a modified MrP model of Indian vote preferences that proposes innovations to each of its three core components: stratification frame, training data, and a learner. For the post-stratification frame we propose a novel Data Integration approach that allows the simultaneous estimation of counts from multiple complementary sources, such as census tables and auxiliary surveys. For the training data we assemble panels of respondents from two unorthodox online populations: Amazon Mechanical Turks workers and Facebook users. And as a modeling tool, we replace the Bayesian multilevel regression learner with Random Forests. Our 2019 pre-election forecasts for the two largest Lok Sahba coalitions were very close to actual outcomes: we predicted 41.8% for the NDA, against an observed value of 45.0% and 30.8% for the UPA against an observed vote share of just under 31.3%. Our uniform-swing seat projection outperforms other pollsters—we had the lowest absolute error of 89 seats (along with a poll from ‘Jan Ki Baat’); the lowest error on the NDA-UPA lead (a mere 8 seats), and we are the only pollster that can capture real-time preference shifts due to salient campaign events. Public Library of Science 2021-11-29 /pmc/articles/PMC8629219/ /pubmed/34843519 http://dx.doi.org/10.1371/journal.pone.0260092 Text en © 2021 Cerina, Duch https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Cerina, Roberto Duch, Raymond Polling India via regression and post-stratification of non-probability online samples
title	Polling India via regression and post-stratification of non-probability online samples
title_full	Polling India via regression and post-stratification of non-probability online samples
title_fullStr	Polling India via regression and post-stratification of non-probability online samples
title_full_unstemmed	Polling India via regression and post-stratification of non-probability online samples
title_short	Polling India via regression and post-stratification of non-probability online samples
title_sort	polling india via regression and post-stratification of non-probability online samples
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8629219/ https://www.ncbi.nlm.nih.gov/pubmed/34843519 http://dx.doi.org/10.1371/journal.pone.0260092
work_keys_str_mv	AT cerinaroberto pollingindiaviaregressionandpoststratificationofnonprobabilityonlinesamples AT duchraymond pollingindiaviaregressionandpoststratificationofnonprobabilityonlinesamples

Polling India via regression and post-stratification of non-probability online samples

Ejemplares similares