Cargando…

Analysis and classification of privacy-sensitive content in social media posts

User-generated contents often contain private information, even when they are shared publicly on social media and on the web in general. Although many filtering and natural language approaches for automatically detecting obscenities or hate speech have been proposed, determining whether a shared pos...

Descripción completa

Detalles Bibliográficos
Autores principales: Bioglio, Livio, Pensa, Ruggero G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Berlin Heidelberg 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8892403/
https://www.ncbi.nlm.nih.gov/pubmed/35261872
http://dx.doi.org/10.1140/epjds/s13688-022-00324-y
_version_ 1784662159193538560
author Bioglio, Livio
Pensa, Ruggero G.
author_facet Bioglio, Livio
Pensa, Ruggero G.
author_sort Bioglio, Livio
collection PubMed
description User-generated contents often contain private information, even when they are shared publicly on social media and on the web in general. Although many filtering and natural language approaches for automatically detecting obscenities or hate speech have been proposed, determining whether a shared post contains sensitive information is still an open issue. The problem has been addressed by assuming, for instance, that sensitive contents are published anonymously, on anonymous social media platforms or with more restrictive privacy settings, but these assumptions are far from being realistic, since the authors of posts often underestimate or overlook their actual exposure to privacy risks. Hence, in this paper, we address the problem of content sensitivity analysis directly, by presenting and characterizing a new annotated corpus with around ten thousand posts, each one annotated as sensitive or non-sensitive by a pool of experts. We characterize our data with respect to the closely-related problem of self-disclosure, pointing out the main differences between the two tasks. We also present the results of several deep neural network models that outperform previous naive attempts of classifying social media posts according to their sensitivity, and show that state-of-the-art approaches based on anonymity and lexical analysis do not work in realistic application scenarios.
format Online
Article
Text
id pubmed-8892403
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Springer Berlin Heidelberg
record_format MEDLINE/PubMed
spelling pubmed-88924032022-03-04 Analysis and classification of privacy-sensitive content in social media posts Bioglio, Livio Pensa, Ruggero G. EPJ Data Sci Regular Article User-generated contents often contain private information, even when they are shared publicly on social media and on the web in general. Although many filtering and natural language approaches for automatically detecting obscenities or hate speech have been proposed, determining whether a shared post contains sensitive information is still an open issue. The problem has been addressed by assuming, for instance, that sensitive contents are published anonymously, on anonymous social media platforms or with more restrictive privacy settings, but these assumptions are far from being realistic, since the authors of posts often underestimate or overlook their actual exposure to privacy risks. Hence, in this paper, we address the problem of content sensitivity analysis directly, by presenting and characterizing a new annotated corpus with around ten thousand posts, each one annotated as sensitive or non-sensitive by a pool of experts. We characterize our data with respect to the closely-related problem of self-disclosure, pointing out the main differences between the two tasks. We also present the results of several deep neural network models that outperform previous naive attempts of classifying social media posts according to their sensitivity, and show that state-of-the-art approaches based on anonymity and lexical analysis do not work in realistic application scenarios. Springer Berlin Heidelberg 2022-03-03 2022 /pmc/articles/PMC8892403/ /pubmed/35261872 http://dx.doi.org/10.1140/epjds/s13688-022-00324-y Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Regular Article
Bioglio, Livio
Pensa, Ruggero G.
Analysis and classification of privacy-sensitive content in social media posts
title Analysis and classification of privacy-sensitive content in social media posts
title_full Analysis and classification of privacy-sensitive content in social media posts
title_fullStr Analysis and classification of privacy-sensitive content in social media posts
title_full_unstemmed Analysis and classification of privacy-sensitive content in social media posts
title_short Analysis and classification of privacy-sensitive content in social media posts
title_sort analysis and classification of privacy-sensitive content in social media posts
topic Regular Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8892403/
https://www.ncbi.nlm.nih.gov/pubmed/35261872
http://dx.doi.org/10.1140/epjds/s13688-022-00324-y
work_keys_str_mv AT biogliolivio analysisandclassificationofprivacysensitivecontentinsocialmediaposts
AT pensaruggerog analysisandclassificationofprivacysensitivecontentinsocialmediaposts