Cargando…

Net activism and whistleblowing on YouTube: a text mining analysis

Social media is more and more dominant in everyday life for people around the world. YouTube content is a resource that may be useful, in social computational science, for understanding key questions about society. Using this resource, we performed web scraping to create a dataset of 644,575 video t...

Descripción completa

Detalles Bibliográficos
Autor principal: Turenne, Nicolas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer US 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9520105/
https://www.ncbi.nlm.nih.gov/pubmed/36193288
http://dx.doi.org/10.1007/s11042-022-13777-0
_version_ 1784799549974380544
author Turenne, Nicolas
author_facet Turenne, Nicolas
author_sort Turenne, Nicolas
collection PubMed
description Social media is more and more dominant in everyday life for people around the world. YouTube content is a resource that may be useful, in social computational science, for understanding key questions about society. Using this resource, we performed web scraping to create a dataset of 644,575 video transcriptions concerning net activism and whistleblowing. We automatically performed linguistic feature extraction to capture a representation of each video using its title, description and transcription (downloaded metadata). The next step was to clean the dataset using automatic clustering with linguistic representation to identify unmatched videos and noisy keywords. Using these keywords to exclude videos, we finally obtained a dataset that was reduced by 95%, i.e., it contained 35,730 video transcriptions. Then, we again automatically clustered the videos using a lexical representation and split the dataset into subsets, leading to hundreds of clusters that we interpreted manually to identify a hierarchy of topics of interest concerning whistleblowing. We used the dataset to learn a lexical representation for a specific topic and to detect unknown whistleblowing videos for this topic; the accuracy of this detection is 57.4%. We also used the dataset to identify interesting context linguistic markers around the names of whistleblowers. From a given list of names, we automatically extracted all 5-g word sequences from the dataset and identified interesting markers in the left and right contexts for each name by manual interpretation. The results of our study are the following: a dataset (raw and cleaned collections) concerning whistleblowing, a hierarchy of topics about whistleblowing, the automatic prediction of whistleblowing and the semi-automatic semantic analysis of markers around whistleblower names. This text mining analysis can be exploited for digital sociology and e-democracy studies.
format Online
Article
Text
id pubmed-9520105
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Springer US
record_format MEDLINE/PubMed
spelling pubmed-95201052022-09-29 Net activism and whistleblowing on YouTube: a text mining analysis Turenne, Nicolas Multimed Tools Appl 1209: Recent Advances on Social Media Analytics and Multimedia Systems: Issues and Challenges Social media is more and more dominant in everyday life for people around the world. YouTube content is a resource that may be useful, in social computational science, for understanding key questions about society. Using this resource, we performed web scraping to create a dataset of 644,575 video transcriptions concerning net activism and whistleblowing. We automatically performed linguistic feature extraction to capture a representation of each video using its title, description and transcription (downloaded metadata). The next step was to clean the dataset using automatic clustering with linguistic representation to identify unmatched videos and noisy keywords. Using these keywords to exclude videos, we finally obtained a dataset that was reduced by 95%, i.e., it contained 35,730 video transcriptions. Then, we again automatically clustered the videos using a lexical representation and split the dataset into subsets, leading to hundreds of clusters that we interpreted manually to identify a hierarchy of topics of interest concerning whistleblowing. We used the dataset to learn a lexical representation for a specific topic and to detect unknown whistleblowing videos for this topic; the accuracy of this detection is 57.4%. We also used the dataset to identify interesting context linguistic markers around the names of whistleblowers. From a given list of names, we automatically extracted all 5-g word sequences from the dataset and identified interesting markers in the left and right contexts for each name by manual interpretation. The results of our study are the following: a dataset (raw and cleaned collections) concerning whistleblowing, a hierarchy of topics about whistleblowing, the automatic prediction of whistleblowing and the semi-automatic semantic analysis of markers around whistleblower names. This text mining analysis can be exploited for digital sociology and e-democracy studies. Springer US 2022-09-29 2023 /pmc/articles/PMC9520105/ /pubmed/36193288 http://dx.doi.org/10.1007/s11042-022-13777-0 Text en © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022, Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle 1209: Recent Advances on Social Media Analytics and Multimedia Systems: Issues and Challenges
Turenne, Nicolas
Net activism and whistleblowing on YouTube: a text mining analysis
title Net activism and whistleblowing on YouTube: a text mining analysis
title_full Net activism and whistleblowing on YouTube: a text mining analysis
title_fullStr Net activism and whistleblowing on YouTube: a text mining analysis
title_full_unstemmed Net activism and whistleblowing on YouTube: a text mining analysis
title_short Net activism and whistleblowing on YouTube: a text mining analysis
title_sort net activism and whistleblowing on youtube: a text mining analysis
topic 1209: Recent Advances on Social Media Analytics and Multimedia Systems: Issues and Challenges
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9520105/
https://www.ncbi.nlm.nih.gov/pubmed/36193288
http://dx.doi.org/10.1007/s11042-022-13777-0
work_keys_str_mv AT turennenicolas netactivismandwhistleblowingonyoutubeatextmininganalysis