Cargando…

Reengineering Workflow for Curation of DICOM Datasets

Reusable, publicly available data is a pillar of open science and rapid advancement of cancer imaging research. Sharing data from completed research studies not only saves research dollars required to collect data, but also helps insure that studies are both replicable and reproducible. The Cancer I...

Descripción completa

Detalles Bibliográficos
Autores principales: Bennett, William, Smith, Kirk, Jarosz, Quasar, Nolan, Tracy, Bosch, Walter
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6261183/
https://www.ncbi.nlm.nih.gov/pubmed/29907888
http://dx.doi.org/10.1007/s10278-018-0097-4
_version_ 1783374933194178560
author Bennett, William
Smith, Kirk
Jarosz, Quasar
Nolan, Tracy
Bosch, Walter
author_facet Bennett, William
Smith, Kirk
Jarosz, Quasar
Nolan, Tracy
Bosch, Walter
author_sort Bennett, William
collection PubMed
description Reusable, publicly available data is a pillar of open science and rapid advancement of cancer imaging research. Sharing data from completed research studies not only saves research dollars required to collect data, but also helps insure that studies are both replicable and reproducible. The Cancer Imaging Archive (TCIA) is a global shared repository for imaging data related to cancer. Insuring the consistency, scientific utility, and anonymity of data stored in TCIA is of utmost importance. As the rate of submission to TCIA has been increasing, both in volume and complexity of DICOM objects stored, the process of curation of collections has become a bottleneck in acquisition of data. In order to increase the rate of curation of image sets, improve the quality of the curation, and better track the provenance of changes made to submitted DICOM image sets, a custom set of tools was developed, using novel methods for the analysis of DICOM data sets. These tools are written in the programming language perl, use the open-source database PostgreSQL, make use of the perl DICOM routines in the open-source package Posda, and incorporate DICOM diagnostic tools from other open-source packages, such as dicom3tools. These tools are referred to as the “Posda Tools.” The Posda Tools are open source and available via git at https://github.com/UAMS-DBMI/PosdaTools. In this paper, we briefly describe the Posda Tools and discuss the novel methods employed by these tools to facilitate rapid analysis of DICOM data, including the following: (1) use a database schema which is more permissive, and differently normalized from traditional DICOM databases; (2) perform integrity checks automatically on a bulk basis; (3) apply revisions to DICOM datasets on an bulk basis, either through a web-based interface or via command line executable perl scripts; (4) all such edits are tracked in a revision tracker and may be rolled back; (5) a UI is provided to inspect the results of such edits, to verify that they are what was intended; (6) identification of DICOM Studies, Series, and SOP instances using “nicknames” which are persistent and have well-defined scope to make expression of reported DICOM errors easier to manage; and (7) rapidly identify potential duplicate DICOM datasets by pixel data is provided; this can be used, e.g., to identify submission subjects which may relate to the same individual, without identifying the individual.
format Online
Article
Text
id pubmed-6261183
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-62611832018-12-11 Reengineering Workflow for Curation of DICOM Datasets Bennett, William Smith, Kirk Jarosz, Quasar Nolan, Tracy Bosch, Walter J Digit Imaging Article Reusable, publicly available data is a pillar of open science and rapid advancement of cancer imaging research. Sharing data from completed research studies not only saves research dollars required to collect data, but also helps insure that studies are both replicable and reproducible. The Cancer Imaging Archive (TCIA) is a global shared repository for imaging data related to cancer. Insuring the consistency, scientific utility, and anonymity of data stored in TCIA is of utmost importance. As the rate of submission to TCIA has been increasing, both in volume and complexity of DICOM objects stored, the process of curation of collections has become a bottleneck in acquisition of data. In order to increase the rate of curation of image sets, improve the quality of the curation, and better track the provenance of changes made to submitted DICOM image sets, a custom set of tools was developed, using novel methods for the analysis of DICOM data sets. These tools are written in the programming language perl, use the open-source database PostgreSQL, make use of the perl DICOM routines in the open-source package Posda, and incorporate DICOM diagnostic tools from other open-source packages, such as dicom3tools. These tools are referred to as the “Posda Tools.” The Posda Tools are open source and available via git at https://github.com/UAMS-DBMI/PosdaTools. In this paper, we briefly describe the Posda Tools and discuss the novel methods employed by these tools to facilitate rapid analysis of DICOM data, including the following: (1) use a database schema which is more permissive, and differently normalized from traditional DICOM databases; (2) perform integrity checks automatically on a bulk basis; (3) apply revisions to DICOM datasets on an bulk basis, either through a web-based interface or via command line executable perl scripts; (4) all such edits are tracked in a revision tracker and may be rolled back; (5) a UI is provided to inspect the results of such edits, to verify that they are what was intended; (6) identification of DICOM Studies, Series, and SOP instances using “nicknames” which are persistent and have well-defined scope to make expression of reported DICOM errors easier to manage; and (7) rapidly identify potential duplicate DICOM datasets by pixel data is provided; this can be used, e.g., to identify submission subjects which may relate to the same individual, without identifying the individual. Springer International Publishing 2018-06-15 2018-12 /pmc/articles/PMC6261183/ /pubmed/29907888 http://dx.doi.org/10.1007/s10278-018-0097-4 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle Article
Bennett, William
Smith, Kirk
Jarosz, Quasar
Nolan, Tracy
Bosch, Walter
Reengineering Workflow for Curation of DICOM Datasets
title Reengineering Workflow for Curation of DICOM Datasets
title_full Reengineering Workflow for Curation of DICOM Datasets
title_fullStr Reengineering Workflow for Curation of DICOM Datasets
title_full_unstemmed Reengineering Workflow for Curation of DICOM Datasets
title_short Reengineering Workflow for Curation of DICOM Datasets
title_sort reengineering workflow for curation of dicom datasets
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6261183/
https://www.ncbi.nlm.nih.gov/pubmed/29907888
http://dx.doi.org/10.1007/s10278-018-0097-4
work_keys_str_mv AT bennettwilliam reengineeringworkflowforcurationofdicomdatasets
AT smithkirk reengineeringworkflowforcurationofdicomdatasets
AT jaroszquasar reengineeringworkflowforcurationofdicomdatasets
AT nolantracy reengineeringworkflowforcurationofdicomdatasets
AT boschwalter reengineeringworkflowforcurationofdicomdatasets