Cargando…
PHDD: Corpus of Physical Health Data Disclosure on Twitter During COVID-19 Pandemic
Health-related information is considered as ‘highly sensitive’ by the European General Data Protection Regulations (GDPR) and determining whether a text document contains health-related information or not is of interest for both individuals and companies in a number of different scenarios. Although...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer Nature Singapore
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8983320/ https://www.ncbi.nlm.nih.gov/pubmed/35400014 http://dx.doi.org/10.1007/s42979-022-01097-x |
_version_ | 1784681961324806144 |
---|---|
author | Saniei, Rana Rodríguez Doncel, Víctor |
author_facet | Saniei, Rana Rodríguez Doncel, Víctor |
author_sort | Saniei, Rana |
collection | PubMed |
description | Health-related information is considered as ‘highly sensitive’ by the European General Data Protection Regulations (GDPR) and determining whether a text document contains health-related information or not is of interest for both individuals and companies in a number of different scenarios. Although some efforts have been made to detect different categories of personal data in texts, including health information, the classification task by machines is still challenging. In this work, we aim to contribute to solving this challenge by building a corpus of tweets being shared in the current COVID-19 pandemic context. The corpus is called PHDD(Corpus of Physical Health Data Disclosure on Twitter During COVID-19 Pandemic) and contains 1,494 tweets which have been manually tagged by three taggers in three dimensions: health-sensitivity status, categories of health information, and subject of health history. Furthermore, a lightweight ontology called PTHI(Privacy Tags for Health Information), which reuses two other vocabularies, namely hl7 and dpv, is built to represent the corpus in a machine-readable format. The corpus is publicly available and can be used by NLP experts for implementation of techniques to detect sensitive health information in textual documents. |
format | Online Article Text |
id | pubmed-8983320 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Springer Nature Singapore |
record_format | MEDLINE/PubMed |
spelling | pubmed-89833202022-04-06 PHDD: Corpus of Physical Health Data Disclosure on Twitter During COVID-19 Pandemic Saniei, Rana Rodríguez Doncel, Víctor SN Comput Sci Original Research Health-related information is considered as ‘highly sensitive’ by the European General Data Protection Regulations (GDPR) and determining whether a text document contains health-related information or not is of interest for both individuals and companies in a number of different scenarios. Although some efforts have been made to detect different categories of personal data in texts, including health information, the classification task by machines is still challenging. In this work, we aim to contribute to solving this challenge by building a corpus of tweets being shared in the current COVID-19 pandemic context. The corpus is called PHDD(Corpus of Physical Health Data Disclosure on Twitter During COVID-19 Pandemic) and contains 1,494 tweets which have been manually tagged by three taggers in three dimensions: health-sensitivity status, categories of health information, and subject of health history. Furthermore, a lightweight ontology called PTHI(Privacy Tags for Health Information), which reuses two other vocabularies, namely hl7 and dpv, is built to represent the corpus in a machine-readable format. The corpus is publicly available and can be used by NLP experts for implementation of techniques to detect sensitive health information in textual documents. Springer Nature Singapore 2022-04-06 2022 /pmc/articles/PMC8983320/ /pubmed/35400014 http://dx.doi.org/10.1007/s42979-022-01097-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Original Research Saniei, Rana Rodríguez Doncel, Víctor PHDD: Corpus of Physical Health Data Disclosure on Twitter During COVID-19 Pandemic |
title | PHDD: Corpus of Physical Health Data Disclosure on Twitter During COVID-19 Pandemic |
title_full | PHDD: Corpus of Physical Health Data Disclosure on Twitter During COVID-19 Pandemic |
title_fullStr | PHDD: Corpus of Physical Health Data Disclosure on Twitter During COVID-19 Pandemic |
title_full_unstemmed | PHDD: Corpus of Physical Health Data Disclosure on Twitter During COVID-19 Pandemic |
title_short | PHDD: Corpus of Physical Health Data Disclosure on Twitter During COVID-19 Pandemic |
title_sort | phdd: corpus of physical health data disclosure on twitter during covid-19 pandemic |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8983320/ https://www.ncbi.nlm.nih.gov/pubmed/35400014 http://dx.doi.org/10.1007/s42979-022-01097-x |
work_keys_str_mv | AT sanieirana phddcorpusofphysicalhealthdatadisclosureontwitterduringcovid19pandemic AT rodriguezdoncelvictor phddcorpusofphysicalhealthdatadisclosureontwitterduringcovid19pandemic |