Cargando…
Analysis and classification of privacy-sensitive content in social media posts
User-generated contents often contain private information, even when they are shared publicly on social media and on the web in general. Although many filtering and natural language approaches for automatically detecting obscenities or hate speech have been proposed, determining whether a shared pos...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer Berlin Heidelberg
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8892403/ https://www.ncbi.nlm.nih.gov/pubmed/35261872 http://dx.doi.org/10.1140/epjds/s13688-022-00324-y |
_version_ | 1784662159193538560 |
---|---|
author | Bioglio, Livio Pensa, Ruggero G. |
author_facet | Bioglio, Livio Pensa, Ruggero G. |
author_sort | Bioglio, Livio |
collection | PubMed |
description | User-generated contents often contain private information, even when they are shared publicly on social media and on the web in general. Although many filtering and natural language approaches for automatically detecting obscenities or hate speech have been proposed, determining whether a shared post contains sensitive information is still an open issue. The problem has been addressed by assuming, for instance, that sensitive contents are published anonymously, on anonymous social media platforms or with more restrictive privacy settings, but these assumptions are far from being realistic, since the authors of posts often underestimate or overlook their actual exposure to privacy risks. Hence, in this paper, we address the problem of content sensitivity analysis directly, by presenting and characterizing a new annotated corpus with around ten thousand posts, each one annotated as sensitive or non-sensitive by a pool of experts. We characterize our data with respect to the closely-related problem of self-disclosure, pointing out the main differences between the two tasks. We also present the results of several deep neural network models that outperform previous naive attempts of classifying social media posts according to their sensitivity, and show that state-of-the-art approaches based on anonymity and lexical analysis do not work in realistic application scenarios. |
format | Online Article Text |
id | pubmed-8892403 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Springer Berlin Heidelberg |
record_format | MEDLINE/PubMed |
spelling | pubmed-88924032022-03-04 Analysis and classification of privacy-sensitive content in social media posts Bioglio, Livio Pensa, Ruggero G. EPJ Data Sci Regular Article User-generated contents often contain private information, even when they are shared publicly on social media and on the web in general. Although many filtering and natural language approaches for automatically detecting obscenities or hate speech have been proposed, determining whether a shared post contains sensitive information is still an open issue. The problem has been addressed by assuming, for instance, that sensitive contents are published anonymously, on anonymous social media platforms or with more restrictive privacy settings, but these assumptions are far from being realistic, since the authors of posts often underestimate or overlook their actual exposure to privacy risks. Hence, in this paper, we address the problem of content sensitivity analysis directly, by presenting and characterizing a new annotated corpus with around ten thousand posts, each one annotated as sensitive or non-sensitive by a pool of experts. We characterize our data with respect to the closely-related problem of self-disclosure, pointing out the main differences between the two tasks. We also present the results of several deep neural network models that outperform previous naive attempts of classifying social media posts according to their sensitivity, and show that state-of-the-art approaches based on anonymity and lexical analysis do not work in realistic application scenarios. Springer Berlin Heidelberg 2022-03-03 2022 /pmc/articles/PMC8892403/ /pubmed/35261872 http://dx.doi.org/10.1140/epjds/s13688-022-00324-y Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Regular Article Bioglio, Livio Pensa, Ruggero G. Analysis and classification of privacy-sensitive content in social media posts |
title | Analysis and classification of privacy-sensitive content in social media posts |
title_full | Analysis and classification of privacy-sensitive content in social media posts |
title_fullStr | Analysis and classification of privacy-sensitive content in social media posts |
title_full_unstemmed | Analysis and classification of privacy-sensitive content in social media posts |
title_short | Analysis and classification of privacy-sensitive content in social media posts |
title_sort | analysis and classification of privacy-sensitive content in social media posts |
topic | Regular Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8892403/ https://www.ncbi.nlm.nih.gov/pubmed/35261872 http://dx.doi.org/10.1140/epjds/s13688-022-00324-y |
work_keys_str_mv | AT biogliolivio analysisandclassificationofprivacysensitivecontentinsocialmediaposts AT pensaruggerog analysisandclassificationofprivacysensitivecontentinsocialmediaposts |