Cargando…

Using heterogeneous sources of data and interpretability of prediction models to explain the characteristics of careless respondents in survey data

Prior to further processing, completed questionnaires must be screened for the presence of careless respondents. Different people will respond to surveys in different ways. Some take the easy path and fill out the survey carelessly. The proportion of careless respondents determines the survey’s qual...

Descripción completa

Detalles Bibliográficos
Autores principales: Kopitar, Leon, Stiglic, Gregor
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10435557/
https://www.ncbi.nlm.nih.gov/pubmed/37591974
http://dx.doi.org/10.1038/s41598-023-40209-2
_version_ 1785092126463229952
author Kopitar, Leon
Stiglic, Gregor
author_facet Kopitar, Leon
Stiglic, Gregor
author_sort Kopitar, Leon
collection PubMed
description Prior to further processing, completed questionnaires must be screened for the presence of careless respondents. Different people will respond to surveys in different ways. Some take the easy path and fill out the survey carelessly. The proportion of careless respondents determines the survey’s quality. As a result, identifying careless respondents is critical for the quality of obtained results. This study aims to explore the characteristics of careless respondents in survey data and evaluate the predictive power and interpretability of different types of data and indices of careless responding. The research question focuses on understanding the behavior of careless respondents and determining the effectiveness of various data sources in predicting their responses. Data from a three-month web-based survey on participants’ personality traits such as honesty-humility, emotionality, extraversion, agreeableness, conscientiousness and openness to experience was used in this study. Data for this study was taken from Schroeders et al.. The gradient boosting machine-based prediction model uses data from the answers, time spent for answering, demographic information on the respondents as well as some indices of careless responding from all three types of data. Prediction models were evaluated with tenfold cross-validation repeated a hundred times. Prediction models were compared based on balanced accuracy. Models’ explanations were provided with Shapley values. Compared with existing work, data fusion from multiple types of information had no noticeable effect on the performance of the gradient boosting machine model. Variables such as “I would never take a bribe, even if it was a lot”, average longstring, and total intra-individual response variability were found to be useful in distinguishing careless respondents. However, variables like “I would be tempted to use counterfeit money if I could get away with it” and intra-individual response variability of the first section of a survey showed limited effectiveness. Additionally, this study indicated that, whereas the psychometric synonym score has an immediate effect and is designed with the goal of identifying careless respondents when combined with other variables, it is not necessarily the optimal choice for fitting a gradient boosting machine model.
format Online
Article
Text
id pubmed-10435557
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-104355572023-08-19 Using heterogeneous sources of data and interpretability of prediction models to explain the characteristics of careless respondents in survey data Kopitar, Leon Stiglic, Gregor Sci Rep Article Prior to further processing, completed questionnaires must be screened for the presence of careless respondents. Different people will respond to surveys in different ways. Some take the easy path and fill out the survey carelessly. The proportion of careless respondents determines the survey’s quality. As a result, identifying careless respondents is critical for the quality of obtained results. This study aims to explore the characteristics of careless respondents in survey data and evaluate the predictive power and interpretability of different types of data and indices of careless responding. The research question focuses on understanding the behavior of careless respondents and determining the effectiveness of various data sources in predicting their responses. Data from a three-month web-based survey on participants’ personality traits such as honesty-humility, emotionality, extraversion, agreeableness, conscientiousness and openness to experience was used in this study. Data for this study was taken from Schroeders et al.. The gradient boosting machine-based prediction model uses data from the answers, time spent for answering, demographic information on the respondents as well as some indices of careless responding from all three types of data. Prediction models were evaluated with tenfold cross-validation repeated a hundred times. Prediction models were compared based on balanced accuracy. Models’ explanations were provided with Shapley values. Compared with existing work, data fusion from multiple types of information had no noticeable effect on the performance of the gradient boosting machine model. Variables such as “I would never take a bribe, even if it was a lot”, average longstring, and total intra-individual response variability were found to be useful in distinguishing careless respondents. However, variables like “I would be tempted to use counterfeit money if I could get away with it” and intra-individual response variability of the first section of a survey showed limited effectiveness. Additionally, this study indicated that, whereas the psychometric synonym score has an immediate effect and is designed with the goal of identifying careless respondents when combined with other variables, it is not necessarily the optimal choice for fitting a gradient boosting machine model. Nature Publishing Group UK 2023-08-17 /pmc/articles/PMC10435557/ /pubmed/37591974 http://dx.doi.org/10.1038/s41598-023-40209-2 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Kopitar, Leon
Stiglic, Gregor
Using heterogeneous sources of data and interpretability of prediction models to explain the characteristics of careless respondents in survey data
title Using heterogeneous sources of data and interpretability of prediction models to explain the characteristics of careless respondents in survey data
title_full Using heterogeneous sources of data and interpretability of prediction models to explain the characteristics of careless respondents in survey data
title_fullStr Using heterogeneous sources of data and interpretability of prediction models to explain the characteristics of careless respondents in survey data
title_full_unstemmed Using heterogeneous sources of data and interpretability of prediction models to explain the characteristics of careless respondents in survey data
title_short Using heterogeneous sources of data and interpretability of prediction models to explain the characteristics of careless respondents in survey data
title_sort using heterogeneous sources of data and interpretability of prediction models to explain the characteristics of careless respondents in survey data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10435557/
https://www.ncbi.nlm.nih.gov/pubmed/37591974
http://dx.doi.org/10.1038/s41598-023-40209-2
work_keys_str_mv AT kopitarleon usingheterogeneoussourcesofdataandinterpretabilityofpredictionmodelstoexplainthecharacteristicsofcarelessrespondentsinsurveydata
AT stiglicgregor usingheterogeneoussourcesofdataandinterpretabilityofpredictionmodelstoexplainthecharacteristicsofcarelessrespondentsinsurveydata