Cargando…
Analysis of data dictionary formats of HIV clinical trials
BACKGROUND: Efforts to define research Common Data Elements try to harmonize data collection across clinical studies. OBJECTIVE: Our goal was to analyze the quality and usability of data dictionaries of HIV studies. METHODS: For the clinical domain of HIV, we searched data sharing platforms and acqu...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7535029/ https://www.ncbi.nlm.nih.gov/pubmed/33017454 http://dx.doi.org/10.1371/journal.pone.0240047 |
_version_ | 1783590402877554688 |
---|---|
author | Mayer, Craig S. Williams, Nick Huser, Vojtech |
author_facet | Mayer, Craig S. Williams, Nick Huser, Vojtech |
author_sort | Mayer, Craig S. |
collection | PubMed |
description | BACKGROUND: Efforts to define research Common Data Elements try to harmonize data collection across clinical studies. OBJECTIVE: Our goal was to analyze the quality and usability of data dictionaries of HIV studies. METHODS: For the clinical domain of HIV, we searched data sharing platforms and acquired a set of 18 HIV related studies from which we analyzed 26 328 data elements. We identified existing standards for creating a data dictionary and reviewed their use. To facilitate aggregation across studies, we defined three types of data dictionary (data element, forms, and permissible values) and created a simple information model for each type. RESULTS: An average study had 427 data elements (ranging from 46 elements to 9 945 elements). In terms of data type, 48.6% of data elements were string, 47.8% were numeric, 3.0% were date and 0.6% were date-time. No study in our sample explicitly declared a data element as a categorical variable and rather considered them either strings or numeric. Only for 61% of studies were we able to obtain permissible values. The majority of studies used CSV files to share a data dictionary while 22% of the studies used a non-computable, PDF format. All studies grouped their data elements. The average number of groups or forms per study was 24 (ranging between 2 and 124 groups/forms). An accurate and well formatted data dictionary facilitates error-free secondary analysis and can help with data de-identification. CONCLUSION: We saw features of data dictionaries that made them difficult to use and understand. This included multiple data dictionary files or non-machine-readable documents, data elements included in data but not in the dictionary or missing data types or descriptions. Building on experience with aggregating data elements across a large set of studies, we created a set of recommendations (called CONSIDER statement) that can guide optimal data sharing of future studies. |
format | Online Article Text |
id | pubmed-7535029 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-75350292020-10-15 Analysis of data dictionary formats of HIV clinical trials Mayer, Craig S. Williams, Nick Huser, Vojtech PLoS One Research Article BACKGROUND: Efforts to define research Common Data Elements try to harmonize data collection across clinical studies. OBJECTIVE: Our goal was to analyze the quality and usability of data dictionaries of HIV studies. METHODS: For the clinical domain of HIV, we searched data sharing platforms and acquired a set of 18 HIV related studies from which we analyzed 26 328 data elements. We identified existing standards for creating a data dictionary and reviewed their use. To facilitate aggregation across studies, we defined three types of data dictionary (data element, forms, and permissible values) and created a simple information model for each type. RESULTS: An average study had 427 data elements (ranging from 46 elements to 9 945 elements). In terms of data type, 48.6% of data elements were string, 47.8% were numeric, 3.0% were date and 0.6% were date-time. No study in our sample explicitly declared a data element as a categorical variable and rather considered them either strings or numeric. Only for 61% of studies were we able to obtain permissible values. The majority of studies used CSV files to share a data dictionary while 22% of the studies used a non-computable, PDF format. All studies grouped their data elements. The average number of groups or forms per study was 24 (ranging between 2 and 124 groups/forms). An accurate and well formatted data dictionary facilitates error-free secondary analysis and can help with data de-identification. CONCLUSION: We saw features of data dictionaries that made them difficult to use and understand. This included multiple data dictionary files or non-machine-readable documents, data elements included in data but not in the dictionary or missing data types or descriptions. Building on experience with aggregating data elements across a large set of studies, we created a set of recommendations (called CONSIDER statement) that can guide optimal data sharing of future studies. Public Library of Science 2020-10-05 /pmc/articles/PMC7535029/ /pubmed/33017454 http://dx.doi.org/10.1371/journal.pone.0240047 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication. |
spellingShingle | Research Article Mayer, Craig S. Williams, Nick Huser, Vojtech Analysis of data dictionary formats of HIV clinical trials |
title | Analysis of data dictionary formats of HIV clinical trials |
title_full | Analysis of data dictionary formats of HIV clinical trials |
title_fullStr | Analysis of data dictionary formats of HIV clinical trials |
title_full_unstemmed | Analysis of data dictionary formats of HIV clinical trials |
title_short | Analysis of data dictionary formats of HIV clinical trials |
title_sort | analysis of data dictionary formats of hiv clinical trials |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7535029/ https://www.ncbi.nlm.nih.gov/pubmed/33017454 http://dx.doi.org/10.1371/journal.pone.0240047 |
work_keys_str_mv | AT mayercraigs analysisofdatadictionaryformatsofhivclinicaltrials AT williamsnick analysisofdatadictionaryformatsofhivclinicaltrials AT huservojtech analysisofdatadictionaryformatsofhivclinicaltrials |