Cargando…

A Policy-Driven Approach to Secure Extraction of COVID-19 Data From Research Papers

The entire scientific and academic community has been mobilized to gain a better understanding of the COVID-19 disease and its impact on humanity. Most research related to COVID-19 needs to analyze large amounts of data in very little time. This urgency has made Big Data Analysis, and related questi...

Descripción completa

Detalles Bibliográficos
Autores principales: Elluri, Lavanya, Piplai, Aritran, Kotal, Anantaa, Joshi, Anupam, Joshi, Karuna Pande
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8387600/
https://www.ncbi.nlm.nih.gov/pubmed/34458724
http://dx.doi.org/10.3389/fdata.2021.701966
_version_ 1783742476268339200
author Elluri, Lavanya
Piplai, Aritran
Kotal, Anantaa
Joshi, Anupam
Joshi, Karuna Pande
author_facet Elluri, Lavanya
Piplai, Aritran
Kotal, Anantaa
Joshi, Anupam
Joshi, Karuna Pande
author_sort Elluri, Lavanya
collection PubMed
description The entire scientific and academic community has been mobilized to gain a better understanding of the COVID-19 disease and its impact on humanity. Most research related to COVID-19 needs to analyze large amounts of data in very little time. This urgency has made Big Data Analysis, and related questions around the privacy and security of the data, an extremely important part of research in the COVID-19 era. The White House OSTP has, for example, released a large dataset of papers related to COVID research from which the research community can extract knowledge and information. We show an example system with a machine learning-based knowledge extractor which draws out key medical information from COVID-19 related academic research papers. We represent this knowledge in a Knowledge Graph that uses the Unified Medical Language System (UMLS). However, publicly available studies rely on dataset that might have sensitive data. Extracting information from academic papers can potentially leak sensitive data, and protecting the security and privacy of this data is equally important. In this paper, we address the key challenges around the privacy and security of such information extraction and analysis systems. Policy regulations like HIPAA have updated the guidelines to access data, specifically, data related to COVID-19, securely. In the US, healthcare providers must also comply with the Office of Civil Rights (OCR) rules to protect data integrity in matters like plasma donation, media access to health care data, telehealth communications, etc. Privacy policies are typically short and unstructured HTML or PDF documents. We have created a framework to extract relevant knowledge from the health centers’ policy documents and also represent these as a knowledge graph. Our framework helps to understand the extent to which individual provider policies comply with regulations and define access control policies that enforce the regulation rules on data in the knowledge graph extracted from COVID-related papers. Along with being compliant, privacy policies must also be transparent and easily understood by the clients. We analyze the relative readability of healthcare privacy policies and discuss the impact. In this paper, we develop a framework for access control decisions that uses policy compliance information to securely retrieve COVID data. We show how policy compliance information can be used to restrict access to COVID-19 data and information extracted from research papers.
format Online
Article
Text
id pubmed-8387600
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-83876002021-08-27 A Policy-Driven Approach to Secure Extraction of COVID-19 Data From Research Papers Elluri, Lavanya Piplai, Aritran Kotal, Anantaa Joshi, Anupam Joshi, Karuna Pande Front Big Data Big Data The entire scientific and academic community has been mobilized to gain a better understanding of the COVID-19 disease and its impact on humanity. Most research related to COVID-19 needs to analyze large amounts of data in very little time. This urgency has made Big Data Analysis, and related questions around the privacy and security of the data, an extremely important part of research in the COVID-19 era. The White House OSTP has, for example, released a large dataset of papers related to COVID research from which the research community can extract knowledge and information. We show an example system with a machine learning-based knowledge extractor which draws out key medical information from COVID-19 related academic research papers. We represent this knowledge in a Knowledge Graph that uses the Unified Medical Language System (UMLS). However, publicly available studies rely on dataset that might have sensitive data. Extracting information from academic papers can potentially leak sensitive data, and protecting the security and privacy of this data is equally important. In this paper, we address the key challenges around the privacy and security of such information extraction and analysis systems. Policy regulations like HIPAA have updated the guidelines to access data, specifically, data related to COVID-19, securely. In the US, healthcare providers must also comply with the Office of Civil Rights (OCR) rules to protect data integrity in matters like plasma donation, media access to health care data, telehealth communications, etc. Privacy policies are typically short and unstructured HTML or PDF documents. We have created a framework to extract relevant knowledge from the health centers’ policy documents and also represent these as a knowledge graph. Our framework helps to understand the extent to which individual provider policies comply with regulations and define access control policies that enforce the regulation rules on data in the knowledge graph extracted from COVID-related papers. Along with being compliant, privacy policies must also be transparent and easily understood by the clients. We analyze the relative readability of healthcare privacy policies and discuss the impact. In this paper, we develop a framework for access control decisions that uses policy compliance information to securely retrieve COVID data. We show how policy compliance information can be used to restrict access to COVID-19 data and information extracted from research papers. Frontiers Media S.A. 2021-08-12 /pmc/articles/PMC8387600/ /pubmed/34458724 http://dx.doi.org/10.3389/fdata.2021.701966 Text en Copyright © 2021 Elluri, Piplai, Kotal, Joshi and Joshi. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Big Data
Elluri, Lavanya
Piplai, Aritran
Kotal, Anantaa
Joshi, Anupam
Joshi, Karuna Pande
A Policy-Driven Approach to Secure Extraction of COVID-19 Data From Research Papers
title A Policy-Driven Approach to Secure Extraction of COVID-19 Data From Research Papers
title_full A Policy-Driven Approach to Secure Extraction of COVID-19 Data From Research Papers
title_fullStr A Policy-Driven Approach to Secure Extraction of COVID-19 Data From Research Papers
title_full_unstemmed A Policy-Driven Approach to Secure Extraction of COVID-19 Data From Research Papers
title_short A Policy-Driven Approach to Secure Extraction of COVID-19 Data From Research Papers
title_sort policy-driven approach to secure extraction of covid-19 data from research papers
topic Big Data
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8387600/
https://www.ncbi.nlm.nih.gov/pubmed/34458724
http://dx.doi.org/10.3389/fdata.2021.701966
work_keys_str_mv AT ellurilavanya apolicydrivenapproachtosecureextractionofcovid19datafromresearchpapers
AT piplaiaritran apolicydrivenapproachtosecureextractionofcovid19datafromresearchpapers
AT kotalanantaa apolicydrivenapproachtosecureextractionofcovid19datafromresearchpapers
AT joshianupam apolicydrivenapproachtosecureextractionofcovid19datafromresearchpapers
AT joshikarunapande apolicydrivenapproachtosecureextractionofcovid19datafromresearchpapers
AT ellurilavanya policydrivenapproachtosecureextractionofcovid19datafromresearchpapers
AT piplaiaritran policydrivenapproachtosecureextractionofcovid19datafromresearchpapers
AT kotalanantaa policydrivenapproachtosecureextractionofcovid19datafromresearchpapers
AT joshianupam policydrivenapproachtosecureextractionofcovid19datafromresearchpapers
AT joshikarunapande policydrivenapproachtosecureextractionofcovid19datafromresearchpapers