Cargando…

Protecting Privacy and Transforming COVID-19 Case Surveillance Datasets for Public Use

OBJECTIVES: Federal open-data initiatives that promote increased sharing of federally collected data are important for transparency, data quality, trust, and relationships with the public and state, tribal, local, and territorial partners. These initiatives advance understanding of health conditions...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Brian, Dupervil, Brandi, Deputy, Nicholas P., Duck, Wil, Soroka, Stephen, Bottichio, Lyndsay, Silk, Benjamin, Price, Jason, Sweeney, Patricia, Fuld, Jennifer, Weber, J. Todd, Pollock, Dan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: SAGE Publications 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8216038/
https://www.ncbi.nlm.nih.gov/pubmed/34139910
http://dx.doi.org/10.1177/00333549211026817
_version_ 1783710342952517632
author Lee, Brian
Dupervil, Brandi
Deputy, Nicholas P.
Duck, Wil
Soroka, Stephen
Bottichio, Lyndsay
Silk, Benjamin
Price, Jason
Sweeney, Patricia
Fuld, Jennifer
Weber, J. Todd
Pollock, Dan
author_facet Lee, Brian
Dupervil, Brandi
Deputy, Nicholas P.
Duck, Wil
Soroka, Stephen
Bottichio, Lyndsay
Silk, Benjamin
Price, Jason
Sweeney, Patricia
Fuld, Jennifer
Weber, J. Todd
Pollock, Dan
author_sort Lee, Brian
collection PubMed
description OBJECTIVES: Federal open-data initiatives that promote increased sharing of federally collected data are important for transparency, data quality, trust, and relationships with the public and state, tribal, local, and territorial partners. These initiatives advance understanding of health conditions and diseases by providing data to researchers, scientists, and policymakers for analysis, collaboration, and use outside the Centers for Disease Control and Prevention (CDC), particularly for emerging conditions such as COVID-19, for which data needs are constantly evolving. Since the beginning of the pandemic, CDC has collected person-level, de-identified data from jurisdictions and currently has more than 8 million records. We describe how CDC designed and produces 2 de-identified public datasets from these collected data. METHODS: We included data elements based on usefulness, public request, and privacy implications; we suppressed some field values to reduce the risk of re-identification and exposure of confidential information. We created datasets and verified them for privacy and confidentiality by using data management platform analytic tools and R scripts. RESULTS: Unrestricted data are available to the public through Data.CDC.gov, and restricted data, with additional fields, are available with a data-use agreement through a private repository on GitHub.com. PRACTICE IMPLICATIONS: Enriched understanding of the available public data, the methods used to create these data, and the algorithms used to protect the privacy of de-identified people allow for improved data use. Automating data-generation procedures improves the volume and timeliness of sharing data.
format Online
Article
Text
id pubmed-8216038
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher SAGE Publications
record_format MEDLINE/PubMed
spelling pubmed-82160382021-08-14 Protecting Privacy and Transforming COVID-19 Case Surveillance Datasets for Public Use Lee, Brian Dupervil, Brandi Deputy, Nicholas P. Duck, Wil Soroka, Stephen Bottichio, Lyndsay Silk, Benjamin Price, Jason Sweeney, Patricia Fuld, Jennifer Weber, J. Todd Pollock, Dan Public Health Rep Public Health Methodology OBJECTIVES: Federal open-data initiatives that promote increased sharing of federally collected data are important for transparency, data quality, trust, and relationships with the public and state, tribal, local, and territorial partners. These initiatives advance understanding of health conditions and diseases by providing data to researchers, scientists, and policymakers for analysis, collaboration, and use outside the Centers for Disease Control and Prevention (CDC), particularly for emerging conditions such as COVID-19, for which data needs are constantly evolving. Since the beginning of the pandemic, CDC has collected person-level, de-identified data from jurisdictions and currently has more than 8 million records. We describe how CDC designed and produces 2 de-identified public datasets from these collected data. METHODS: We included data elements based on usefulness, public request, and privacy implications; we suppressed some field values to reduce the risk of re-identification and exposure of confidential information. We created datasets and verified them for privacy and confidentiality by using data management platform analytic tools and R scripts. RESULTS: Unrestricted data are available to the public through Data.CDC.gov, and restricted data, with additional fields, are available with a data-use agreement through a private repository on GitHub.com. PRACTICE IMPLICATIONS: Enriched understanding of the available public data, the methods used to create these data, and the algorithms used to protect the privacy of de-identified people allow for improved data use. Automating data-generation procedures improves the volume and timeliness of sharing data. SAGE Publications 2021-06-17 /pmc/articles/PMC8216038/ /pubmed/34139910 http://dx.doi.org/10.1177/00333549211026817 Text en © 2021, Association of Schools and Programs of Public Health https://creativecommons.org/licenses/by/4.0/This article is distributed under the terms of the Creative Commons Attribution 4.0 License (https://creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle Public Health Methodology
Lee, Brian
Dupervil, Brandi
Deputy, Nicholas P.
Duck, Wil
Soroka, Stephen
Bottichio, Lyndsay
Silk, Benjamin
Price, Jason
Sweeney, Patricia
Fuld, Jennifer
Weber, J. Todd
Pollock, Dan
Protecting Privacy and Transforming COVID-19 Case Surveillance Datasets for Public Use
title Protecting Privacy and Transforming COVID-19 Case Surveillance Datasets for Public Use
title_full Protecting Privacy and Transforming COVID-19 Case Surveillance Datasets for Public Use
title_fullStr Protecting Privacy and Transforming COVID-19 Case Surveillance Datasets for Public Use
title_full_unstemmed Protecting Privacy and Transforming COVID-19 Case Surveillance Datasets for Public Use
title_short Protecting Privacy and Transforming COVID-19 Case Surveillance Datasets for Public Use
title_sort protecting privacy and transforming covid-19 case surveillance datasets for public use
topic Public Health Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8216038/
https://www.ncbi.nlm.nih.gov/pubmed/34139910
http://dx.doi.org/10.1177/00333549211026817
work_keys_str_mv AT leebrian protectingprivacyandtransformingcovid19casesurveillancedatasetsforpublicuse
AT dupervilbrandi protectingprivacyandtransformingcovid19casesurveillancedatasetsforpublicuse
AT deputynicholasp protectingprivacyandtransformingcovid19casesurveillancedatasetsforpublicuse
AT duckwil protectingprivacyandtransformingcovid19casesurveillancedatasetsforpublicuse
AT sorokastephen protectingprivacyandtransformingcovid19casesurveillancedatasetsforpublicuse
AT bottichiolyndsay protectingprivacyandtransformingcovid19casesurveillancedatasetsforpublicuse
AT silkbenjamin protectingprivacyandtransformingcovid19casesurveillancedatasetsforpublicuse
AT pricejason protectingprivacyandtransformingcovid19casesurveillancedatasetsforpublicuse
AT sweeneypatricia protectingprivacyandtransformingcovid19casesurveillancedatasetsforpublicuse
AT fuldjennifer protectingprivacyandtransformingcovid19casesurveillancedatasetsforpublicuse
AT weberjtodd protectingprivacyandtransformingcovid19casesurveillancedatasetsforpublicuse
AT pollockdan protectingprivacyandtransformingcovid19casesurveillancedatasetsforpublicuse