Cargando…

Reengineering Workflow for Curation of DICOM Datasets

Reusable, publicly available data is a pillar of open science and rapid advancement of cancer imaging research. Sharing data from completed research studies not only saves research dollars required to collect data, but also helps insure that studies are both replicable and reproducible. The Cancer I...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bennett, William, Smith, Kirk, Jarosz, Quasar, Nolan, Tracy, Bosch, Walter
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer International Publishing 2018
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6261183/ https://www.ncbi.nlm.nih.gov/pubmed/29907888 http://dx.doi.org/10.1007/s10278-018-0097-4

_version_	1783374933194178560
author	Bennett, William Smith, Kirk Jarosz, Quasar Nolan, Tracy Bosch, Walter
author_facet	Bennett, William Smith, Kirk Jarosz, Quasar Nolan, Tracy Bosch, Walter
author_sort	Bennett, William
collection	PubMed
description	Reusable, publicly available data is a pillar of open science and rapid advancement of cancer imaging research. Sharing data from completed research studies not only saves research dollars required to collect data, but also helps insure that studies are both replicable and reproducible. The Cancer Imaging Archive (TCIA) is a global shared repository for imaging data related to cancer. Insuring the consistency, scientific utility, and anonymity of data stored in TCIA is of utmost importance. As the rate of submission to TCIA has been increasing, both in volume and complexity of DICOM objects stored, the process of curation of collections has become a bottleneck in acquisition of data. In order to increase the rate of curation of image sets, improve the quality of the curation, and better track the provenance of changes made to submitted DICOM image sets, a custom set of tools was developed, using novel methods for the analysis of DICOM data sets. These tools are written in the programming language perl, use the open-source database PostgreSQL, make use of the perl DICOM routines in the open-source package Posda, and incorporate DICOM diagnostic tools from other open-source packages, such as dicom3tools. These tools are referred to as the “Posda Tools.” The Posda Tools are open source and available via git at https://github.com/UAMS-DBMI/PosdaTools. In this paper, we briefly describe the Posda Tools and discuss the novel methods employed by these tools to facilitate rapid analysis of DICOM data, including the following: (1) use a database schema which is more permissive, and differently normalized from traditional DICOM databases; (2) perform integrity checks automatically on a bulk basis; (3) apply revisions to DICOM datasets on an bulk basis, either through a web-based interface or via command line executable perl scripts; (4) all such edits are tracked in a revision tracker and may be rolled back; (5) a UI is provided to inspect the results of such edits, to verify that they are what was intended; (6) identification of DICOM Studies, Series, and SOP instances using “nicknames” which are persistent and have well-defined scope to make expression of reported DICOM errors easier to manage; and (7) rapidly identify potential duplicate DICOM datasets by pixel data is provided; this can be used, e.g., to identify submission subjects which may relate to the same individual, without identifying the individual.
format	Online Article Text
id	pubmed-6261183
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Springer International Publishing
record_format	MEDLINE/PubMed
spelling	pubmed-62611832018-12-11 Reengineering Workflow for Curation of DICOM Datasets Bennett, William Smith, Kirk Jarosz, Quasar Nolan, Tracy Bosch, Walter J Digit Imaging Article Reusable, publicly available data is a pillar of open science and rapid advancement of cancer imaging research. Sharing data from completed research studies not only saves research dollars required to collect data, but also helps insure that studies are both replicable and reproducible. The Cancer Imaging Archive (TCIA) is a global shared repository for imaging data related to cancer. Insuring the consistency, scientific utility, and anonymity of data stored in TCIA is of utmost importance. As the rate of submission to TCIA has been increasing, both in volume and complexity of DICOM objects stored, the process of curation of collections has become a bottleneck in acquisition of data. In order to increase the rate of curation of image sets, improve the quality of the curation, and better track the provenance of changes made to submitted DICOM image sets, a custom set of tools was developed, using novel methods for the analysis of DICOM data sets. These tools are written in the programming language perl, use the open-source database PostgreSQL, make use of the perl DICOM routines in the open-source package Posda, and incorporate DICOM diagnostic tools from other open-source packages, such as dicom3tools. These tools are referred to as the “Posda Tools.” The Posda Tools are open source and available via git at https://github.com/UAMS-DBMI/PosdaTools. In this paper, we briefly describe the Posda Tools and discuss the novel methods employed by these tools to facilitate rapid analysis of DICOM data, including the following: (1) use a database schema which is more permissive, and differently normalized from traditional DICOM databases; (2) perform integrity checks automatically on a bulk basis; (3) apply revisions to DICOM datasets on an bulk basis, either through a web-based interface or via command line executable perl scripts; (4) all such edits are tracked in a revision tracker and may be rolled back; (5) a UI is provided to inspect the results of such edits, to verify that they are what was intended; (6) identification of DICOM Studies, Series, and SOP instances using “nicknames” which are persistent and have well-defined scope to make expression of reported DICOM errors easier to manage; and (7) rapidly identify potential duplicate DICOM datasets by pixel data is provided; this can be used, e.g., to identify submission subjects which may relate to the same individual, without identifying the individual. Springer International Publishing 2018-06-15 2018-12 /pmc/articles/PMC6261183/ /pubmed/29907888 http://dx.doi.org/10.1007/s10278-018-0097-4 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle	Article Bennett, William Smith, Kirk Jarosz, Quasar Nolan, Tracy Bosch, Walter Reengineering Workflow for Curation of DICOM Datasets
title	Reengineering Workflow for Curation of DICOM Datasets
title_full	Reengineering Workflow for Curation of DICOM Datasets
title_fullStr	Reengineering Workflow for Curation of DICOM Datasets
title_full_unstemmed	Reengineering Workflow for Curation of DICOM Datasets
title_short	Reengineering Workflow for Curation of DICOM Datasets
title_sort	reengineering workflow for curation of dicom datasets
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6261183/ https://www.ncbi.nlm.nih.gov/pubmed/29907888 http://dx.doi.org/10.1007/s10278-018-0097-4
work_keys_str_mv	AT bennettwilliam reengineeringworkflowforcurationofdicomdatasets AT smithkirk reengineeringworkflowforcurationofdicomdatasets AT jaroszquasar reengineeringworkflowforcurationofdicomdatasets AT nolantracy reengineeringworkflowforcurationofdicomdatasets AT boschwalter reengineeringworkflowforcurationofdicomdatasets

Reengineering Workflow for Curation of DICOM Datasets

Ejemplares similares