Cargando…

PHDD: Corpus of Physical Health Data Disclosure on Twitter During COVID-19 Pandemic

Health-related information is considered as ‘highly sensitive’ by the European General Data Protection Regulations (GDPR) and determining whether a text document contains health-related information or not is of interest for both individuals and companies in a number of different scenarios. Although...

Descripción completa

Detalles Bibliográficos
Autores principales: Saniei, Rana, Rodríguez Doncel, Víctor
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Nature Singapore 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8983320/
https://www.ncbi.nlm.nih.gov/pubmed/35400014
http://dx.doi.org/10.1007/s42979-022-01097-x
Descripción
Sumario:Health-related information is considered as ‘highly sensitive’ by the European General Data Protection Regulations (GDPR) and determining whether a text document contains health-related information or not is of interest for both individuals and companies in a number of different scenarios. Although some efforts have been made to detect different categories of personal data in texts, including health information, the classification task by machines is still challenging. In this work, we aim to contribute to solving this challenge by building a corpus of tweets being shared in the current COVID-19 pandemic context. The corpus is called PHDD(Corpus of Physical Health Data Disclosure on Twitter During COVID-19 Pandemic) and contains 1,494 tweets which have been manually tagged by three taggers in three dimensions: health-sensitivity status, categories of health information, and subject of health history. Furthermore, a lightweight ontology called PTHI(Privacy Tags for Health Information), which reuses two other vocabularies, namely hl7 and dpv, is built to represent the corpus in a machine-readable format. The corpus is publicly available and can be used by NLP experts for implementation of techniques to detect sensitive health information in textual documents.