Cargando…
Net activism and whistleblowing on YouTube: a text mining analysis
Social media is more and more dominant in everyday life for people around the world. YouTube content is a resource that may be useful, in social computational science, for understanding key questions about society. Using this resource, we performed web scraping to create a dataset of 644,575 video t...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer US
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9520105/ https://www.ncbi.nlm.nih.gov/pubmed/36193288 http://dx.doi.org/10.1007/s11042-022-13777-0 |
_version_ | 1784799549974380544 |
---|---|
author | Turenne, Nicolas |
author_facet | Turenne, Nicolas |
author_sort | Turenne, Nicolas |
collection | PubMed |
description | Social media is more and more dominant in everyday life for people around the world. YouTube content is a resource that may be useful, in social computational science, for understanding key questions about society. Using this resource, we performed web scraping to create a dataset of 644,575 video transcriptions concerning net activism and whistleblowing. We automatically performed linguistic feature extraction to capture a representation of each video using its title, description and transcription (downloaded metadata). The next step was to clean the dataset using automatic clustering with linguistic representation to identify unmatched videos and noisy keywords. Using these keywords to exclude videos, we finally obtained a dataset that was reduced by 95%, i.e., it contained 35,730 video transcriptions. Then, we again automatically clustered the videos using a lexical representation and split the dataset into subsets, leading to hundreds of clusters that we interpreted manually to identify a hierarchy of topics of interest concerning whistleblowing. We used the dataset to learn a lexical representation for a specific topic and to detect unknown whistleblowing videos for this topic; the accuracy of this detection is 57.4%. We also used the dataset to identify interesting context linguistic markers around the names of whistleblowers. From a given list of names, we automatically extracted all 5-g word sequences from the dataset and identified interesting markers in the left and right contexts for each name by manual interpretation. The results of our study are the following: a dataset (raw and cleaned collections) concerning whistleblowing, a hierarchy of topics about whistleblowing, the automatic prediction of whistleblowing and the semi-automatic semantic analysis of markers around whistleblower names. This text mining analysis can be exploited for digital sociology and e-democracy studies. |
format | Online Article Text |
id | pubmed-9520105 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Springer US |
record_format | MEDLINE/PubMed |
spelling | pubmed-95201052022-09-29 Net activism and whistleblowing on YouTube: a text mining analysis Turenne, Nicolas Multimed Tools Appl 1209: Recent Advances on Social Media Analytics and Multimedia Systems: Issues and Challenges Social media is more and more dominant in everyday life for people around the world. YouTube content is a resource that may be useful, in social computational science, for understanding key questions about society. Using this resource, we performed web scraping to create a dataset of 644,575 video transcriptions concerning net activism and whistleblowing. We automatically performed linguistic feature extraction to capture a representation of each video using its title, description and transcription (downloaded metadata). The next step was to clean the dataset using automatic clustering with linguistic representation to identify unmatched videos and noisy keywords. Using these keywords to exclude videos, we finally obtained a dataset that was reduced by 95%, i.e., it contained 35,730 video transcriptions. Then, we again automatically clustered the videos using a lexical representation and split the dataset into subsets, leading to hundreds of clusters that we interpreted manually to identify a hierarchy of topics of interest concerning whistleblowing. We used the dataset to learn a lexical representation for a specific topic and to detect unknown whistleblowing videos for this topic; the accuracy of this detection is 57.4%. We also used the dataset to identify interesting context linguistic markers around the names of whistleblowers. From a given list of names, we automatically extracted all 5-g word sequences from the dataset and identified interesting markers in the left and right contexts for each name by manual interpretation. The results of our study are the following: a dataset (raw and cleaned collections) concerning whistleblowing, a hierarchy of topics about whistleblowing, the automatic prediction of whistleblowing and the semi-automatic semantic analysis of markers around whistleblower names. This text mining analysis can be exploited for digital sociology and e-democracy studies. Springer US 2022-09-29 2023 /pmc/articles/PMC9520105/ /pubmed/36193288 http://dx.doi.org/10.1007/s11042-022-13777-0 Text en © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022, Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | 1209: Recent Advances on Social Media Analytics and Multimedia Systems: Issues and Challenges Turenne, Nicolas Net activism and whistleblowing on YouTube: a text mining analysis |
title | Net activism and whistleblowing on YouTube: a text mining analysis |
title_full | Net activism and whistleblowing on YouTube: a text mining analysis |
title_fullStr | Net activism and whistleblowing on YouTube: a text mining analysis |
title_full_unstemmed | Net activism and whistleblowing on YouTube: a text mining analysis |
title_short | Net activism and whistleblowing on YouTube: a text mining analysis |
title_sort | net activism and whistleblowing on youtube: a text mining analysis |
topic | 1209: Recent Advances on Social Media Analytics and Multimedia Systems: Issues and Challenges |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9520105/ https://www.ncbi.nlm.nih.gov/pubmed/36193288 http://dx.doi.org/10.1007/s11042-022-13777-0 |
work_keys_str_mv | AT turennenicolas netactivismandwhistleblowingonyoutubeatextmininganalysis |