Cargando…

The False positive problem of automatic bot detection in social science research

The identification of bots is an important and complicated task. The bot classifier "Botometer" was successfully introduced as a way to estimate the number of bots in a given list of accounts and, as a consequence, has been frequently used in academic publications. Given its relevance for...

Descripción completa

Detalles Bibliográficos
Autores principales: Rauchfleisch, Adrian, Kaiser, Jonas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7580919/
https://www.ncbi.nlm.nih.gov/pubmed/33091067
http://dx.doi.org/10.1371/journal.pone.0241045
_version_ 1783598868539113472
author Rauchfleisch, Adrian
Kaiser, Jonas
author_facet Rauchfleisch, Adrian
Kaiser, Jonas
author_sort Rauchfleisch, Adrian
collection PubMed
description The identification of bots is an important and complicated task. The bot classifier "Botometer" was successfully introduced as a way to estimate the number of bots in a given list of accounts and, as a consequence, has been frequently used in academic publications. Given its relevance for academic research and our understanding of the presence of automated accounts in any given Twitter discourse, we are interested in Botometer’s diagnostic ability over time. To do so, we collected the Botometer scores for five datasets (three verified as bots, two verified as human; n = 4,134) in two languages (English/German) over three months. We show that the Botometer scores are imprecise when it comes to estimating bots; especially in a different language. We further show in an analysis of Botometer scores over time that Botometer's thresholds, even when used very conservatively, are prone to variance, which, in turn, will lead to false negatives (i.e., bots being classified as humans) and false positives (i.e., humans being classified as bots). This has immediate consequences for academic research as most studies in social science using the tool will unknowingly count a high number of human users as bots and vice versa. We conclude our study with a discussion about how computational social scientists should evaluate machine learning systems that are developed for identifying bots.
format Online
Article
Text
id pubmed-7580919
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-75809192020-10-27 The False positive problem of automatic bot detection in social science research Rauchfleisch, Adrian Kaiser, Jonas PLoS One Research Article The identification of bots is an important and complicated task. The bot classifier "Botometer" was successfully introduced as a way to estimate the number of bots in a given list of accounts and, as a consequence, has been frequently used in academic publications. Given its relevance for academic research and our understanding of the presence of automated accounts in any given Twitter discourse, we are interested in Botometer’s diagnostic ability over time. To do so, we collected the Botometer scores for five datasets (three verified as bots, two verified as human; n = 4,134) in two languages (English/German) over three months. We show that the Botometer scores are imprecise when it comes to estimating bots; especially in a different language. We further show in an analysis of Botometer scores over time that Botometer's thresholds, even when used very conservatively, are prone to variance, which, in turn, will lead to false negatives (i.e., bots being classified as humans) and false positives (i.e., humans being classified as bots). This has immediate consequences for academic research as most studies in social science using the tool will unknowingly count a high number of human users as bots and vice versa. We conclude our study with a discussion about how computational social scientists should evaluate machine learning systems that are developed for identifying bots. Public Library of Science 2020-10-22 /pmc/articles/PMC7580919/ /pubmed/33091067 http://dx.doi.org/10.1371/journal.pone.0241045 Text en © 2020 Rauchfleisch, Kaiser http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Rauchfleisch, Adrian
Kaiser, Jonas
The False positive problem of automatic bot detection in social science research
title The False positive problem of automatic bot detection in social science research
title_full The False positive problem of automatic bot detection in social science research
title_fullStr The False positive problem of automatic bot detection in social science research
title_full_unstemmed The False positive problem of automatic bot detection in social science research
title_short The False positive problem of automatic bot detection in social science research
title_sort false positive problem of automatic bot detection in social science research
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7580919/
https://www.ncbi.nlm.nih.gov/pubmed/33091067
http://dx.doi.org/10.1371/journal.pone.0241045
work_keys_str_mv AT rauchfleischadrian thefalsepositiveproblemofautomaticbotdetectioninsocialscienceresearch
AT kaiserjonas thefalsepositiveproblemofautomaticbotdetectioninsocialscienceresearch
AT rauchfleischadrian falsepositiveproblemofautomaticbotdetectioninsocialscienceresearch
AT kaiserjonas falsepositiveproblemofautomaticbotdetectioninsocialscienceresearch