Cargando…

A hierarchical clustering approach to identify repeated enrollments in web survey data

INTRODUCTION: Online surveys are a valuable tool for social science research, but the perceived anonymity provided by online administration may lead to problematic behaviors from study participants. Particularly, if a study offers incentives, some participants may attempt to enroll multiple times. W...

Descripción completa

Detalles Bibliográficos
Autores principales: Handorf, Elizabeth A., Heckman, Carolyn J., Darlow, Susan, Slifker, Michael, Ritterband, Lee
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6155511/
https://www.ncbi.nlm.nih.gov/pubmed/30252908
http://dx.doi.org/10.1371/journal.pone.0204394
_version_ 1783357912199987200
author Handorf, Elizabeth A.
Heckman, Carolyn J.
Darlow, Susan
Slifker, Michael
Ritterband, Lee
author_facet Handorf, Elizabeth A.
Heckman, Carolyn J.
Darlow, Susan
Slifker, Michael
Ritterband, Lee
author_sort Handorf, Elizabeth A.
collection PubMed
description INTRODUCTION: Online surveys are a valuable tool for social science research, but the perceived anonymity provided by online administration may lead to problematic behaviors from study participants. Particularly, if a study offers incentives, some participants may attempt to enroll multiple times. We propose a method to identify clusters of non-independent enrollments in a web-based study, motivated by an analysis of survey data which tests the effectiveness of an online skin-cancer risk reduction program. METHODS: To identify groups of enrollments, we used a hierarchical clustering algorithm based on the Euclidean distance matrix formed by participant responses to a series of Likert-type eligibility questions. We then systematically identified clusters that are unusual in terms of both size and similarity, by repeatedly simulating datasets from the empirical distribution of responses under the assumption of independent enrollments. By performing the clustering algorithm on the simulated datasets, we determined the distribution of cluster size and similarity under independence, which is then used to identify groups of outliers in the observed data. Next, we assessed 12 other quality indicators, including previously proposed and study-specific measures. We summarized the quality measures by cluster membership, and compared the cluster groupings to those found when using the quality indicators with latent class modeling. RESULTS AND CONCLUSIONS: When we excluded the clustered enrollments and/or lower-quality latent classes from the analysis of study outcomes, the estimates of the intervention effect were larger. This demonstrates how including repeat or low quality participants can introduce bias into a web-based study. As much as is possible, web-based surveys should be designed to verify participant quality. Our method can be used to verify survey quality and identify problematic groups of enrollments when necessary.
format Online
Article
Text
id pubmed-6155511
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-61555112018-10-19 A hierarchical clustering approach to identify repeated enrollments in web survey data Handorf, Elizabeth A. Heckman, Carolyn J. Darlow, Susan Slifker, Michael Ritterband, Lee PLoS One Research Article INTRODUCTION: Online surveys are a valuable tool for social science research, but the perceived anonymity provided by online administration may lead to problematic behaviors from study participants. Particularly, if a study offers incentives, some participants may attempt to enroll multiple times. We propose a method to identify clusters of non-independent enrollments in a web-based study, motivated by an analysis of survey data which tests the effectiveness of an online skin-cancer risk reduction program. METHODS: To identify groups of enrollments, we used a hierarchical clustering algorithm based on the Euclidean distance matrix formed by participant responses to a series of Likert-type eligibility questions. We then systematically identified clusters that are unusual in terms of both size and similarity, by repeatedly simulating datasets from the empirical distribution of responses under the assumption of independent enrollments. By performing the clustering algorithm on the simulated datasets, we determined the distribution of cluster size and similarity under independence, which is then used to identify groups of outliers in the observed data. Next, we assessed 12 other quality indicators, including previously proposed and study-specific measures. We summarized the quality measures by cluster membership, and compared the cluster groupings to those found when using the quality indicators with latent class modeling. RESULTS AND CONCLUSIONS: When we excluded the clustered enrollments and/or lower-quality latent classes from the analysis of study outcomes, the estimates of the intervention effect were larger. This demonstrates how including repeat or low quality participants can introduce bias into a web-based study. As much as is possible, web-based surveys should be designed to verify participant quality. Our method can be used to verify survey quality and identify problematic groups of enrollments when necessary. Public Library of Science 2018-09-25 /pmc/articles/PMC6155511/ /pubmed/30252908 http://dx.doi.org/10.1371/journal.pone.0204394 Text en © 2018 Handorf et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Handorf, Elizabeth A.
Heckman, Carolyn J.
Darlow, Susan
Slifker, Michael
Ritterband, Lee
A hierarchical clustering approach to identify repeated enrollments in web survey data
title A hierarchical clustering approach to identify repeated enrollments in web survey data
title_full A hierarchical clustering approach to identify repeated enrollments in web survey data
title_fullStr A hierarchical clustering approach to identify repeated enrollments in web survey data
title_full_unstemmed A hierarchical clustering approach to identify repeated enrollments in web survey data
title_short A hierarchical clustering approach to identify repeated enrollments in web survey data
title_sort hierarchical clustering approach to identify repeated enrollments in web survey data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6155511/
https://www.ncbi.nlm.nih.gov/pubmed/30252908
http://dx.doi.org/10.1371/journal.pone.0204394
work_keys_str_mv AT handorfelizabetha ahierarchicalclusteringapproachtoidentifyrepeatedenrollmentsinwebsurveydata
AT heckmancarolynj ahierarchicalclusteringapproachtoidentifyrepeatedenrollmentsinwebsurveydata
AT darlowsusan ahierarchicalclusteringapproachtoidentifyrepeatedenrollmentsinwebsurveydata
AT slifkermichael ahierarchicalclusteringapproachtoidentifyrepeatedenrollmentsinwebsurveydata
AT ritterbandlee ahierarchicalclusteringapproachtoidentifyrepeatedenrollmentsinwebsurveydata
AT handorfelizabetha hierarchicalclusteringapproachtoidentifyrepeatedenrollmentsinwebsurveydata
AT heckmancarolynj hierarchicalclusteringapproachtoidentifyrepeatedenrollmentsinwebsurveydata
AT darlowsusan hierarchicalclusteringapproachtoidentifyrepeatedenrollmentsinwebsurveydata
AT slifkermichael hierarchicalclusteringapproachtoidentifyrepeatedenrollmentsinwebsurveydata
AT ritterbandlee hierarchicalclusteringapproachtoidentifyrepeatedenrollmentsinwebsurveydata