Cargando…
Enabling Agile Clinical and Translational Data Warehousing: Platform Development and Evaluation
BACKGROUND: Modern data-driven medical research provides new insights into the development and course of diseases and enables novel methods of clinical decision support. Clinical and translational data warehouses, such as Informatics for Integrating Biology and the Bedside (i2b2) and tranSMART, are...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
JMIR Publications
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7404007/ https://www.ncbi.nlm.nih.gov/pubmed/32706673 http://dx.doi.org/10.2196/15918 |
_version_ | 1783567056647487488 |
---|---|
author | Spengler, Helmut Lang, Claudia Mahapatra, Tanmaya Gatz, Ingrid Kuhn, Klaus A Prasser, Fabian |
author_facet | Spengler, Helmut Lang, Claudia Mahapatra, Tanmaya Gatz, Ingrid Kuhn, Klaus A Prasser, Fabian |
author_sort | Spengler, Helmut |
collection | PubMed |
description | BACKGROUND: Modern data-driven medical research provides new insights into the development and course of diseases and enables novel methods of clinical decision support. Clinical and translational data warehouses, such as Informatics for Integrating Biology and the Bedside (i2b2) and tranSMART, are important infrastructure components that provide users with unified access to the large heterogeneous data sets needed to realize this and support use cases such as cohort selection, hypothesis generation, and ad hoc data analysis. OBJECTIVE: Often, different warehousing platforms are needed to support different use cases and different types of data. Moreover, to achieve an optimal data representation within the target systems, specific domain knowledge is needed when designing data-loading processes. Consequently, informaticians need to work closely with clinicians and researchers in short iterations. This is a challenging task as installing and maintaining warehousing platforms can be complex and time consuming. Furthermore, data loading typically requires significant effort in terms of data preprocessing, cleansing, and restructuring. The platform described in this study aims to address these challenges. METHODS: We formulated system requirements to achieve agility in terms of platform management and data loading. The derived system architecture includes a cloud infrastructure with unified management interfaces for multiple warehouse platforms and a data-loading pipeline with a declarative configuration paradigm and meta-loading approach. The latter compiles data and configuration files into forms required by existing loading tools, thereby automating a wide range of data restructuring and cleansing tasks. We demonstrated the fulfillment of the requirements and the originality of our approach by an experimental evaluation and a comparison with previous work. RESULTS: The platform supports both i2b2 and tranSMART with built-in security. Our experiments showed that the loading pipeline accepts input data that cannot be loaded with existing tools without preprocessing. Moreover, it lowered efforts significantly, reducing the size of configuration files required by factors of up to 22 for tranSMART and 1135 for i2b2. The time required to perform the compilation process was roughly equivalent to the time required for actual data loading. Comparison with other tools showed that our solution was the only tool fulfilling all requirements. CONCLUSIONS: Our platform significantly reduces the efforts required for managing clinical and translational warehouses and for loading data in various formats and structures, such as complex entity-attribute-value structures often found in laboratory data. Moreover, it facilitates the iterative refinement of data representations in the target platforms, as the required configuration files are very compact. The quantitative measurements presented are consistent with our experiences of significantly reduced efforts for building warehousing platforms in close cooperation with medical researchers. Both the cloud-based hosting infrastructure and the data-loading pipeline are available to the community as open source software with comprehensive documentation. |
format | Online Article Text |
id | pubmed-7404007 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | JMIR Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-74040072020-08-17 Enabling Agile Clinical and Translational Data Warehousing: Platform Development and Evaluation Spengler, Helmut Lang, Claudia Mahapatra, Tanmaya Gatz, Ingrid Kuhn, Klaus A Prasser, Fabian JMIR Med Inform Original Paper BACKGROUND: Modern data-driven medical research provides new insights into the development and course of diseases and enables novel methods of clinical decision support. Clinical and translational data warehouses, such as Informatics for Integrating Biology and the Bedside (i2b2) and tranSMART, are important infrastructure components that provide users with unified access to the large heterogeneous data sets needed to realize this and support use cases such as cohort selection, hypothesis generation, and ad hoc data analysis. OBJECTIVE: Often, different warehousing platforms are needed to support different use cases and different types of data. Moreover, to achieve an optimal data representation within the target systems, specific domain knowledge is needed when designing data-loading processes. Consequently, informaticians need to work closely with clinicians and researchers in short iterations. This is a challenging task as installing and maintaining warehousing platforms can be complex and time consuming. Furthermore, data loading typically requires significant effort in terms of data preprocessing, cleansing, and restructuring. The platform described in this study aims to address these challenges. METHODS: We formulated system requirements to achieve agility in terms of platform management and data loading. The derived system architecture includes a cloud infrastructure with unified management interfaces for multiple warehouse platforms and a data-loading pipeline with a declarative configuration paradigm and meta-loading approach. The latter compiles data and configuration files into forms required by existing loading tools, thereby automating a wide range of data restructuring and cleansing tasks. We demonstrated the fulfillment of the requirements and the originality of our approach by an experimental evaluation and a comparison with previous work. RESULTS: The platform supports both i2b2 and tranSMART with built-in security. Our experiments showed that the loading pipeline accepts input data that cannot be loaded with existing tools without preprocessing. Moreover, it lowered efforts significantly, reducing the size of configuration files required by factors of up to 22 for tranSMART and 1135 for i2b2. The time required to perform the compilation process was roughly equivalent to the time required for actual data loading. Comparison with other tools showed that our solution was the only tool fulfilling all requirements. CONCLUSIONS: Our platform significantly reduces the efforts required for managing clinical and translational warehouses and for loading data in various formats and structures, such as complex entity-attribute-value structures often found in laboratory data. Moreover, it facilitates the iterative refinement of data representations in the target platforms, as the required configuration files are very compact. The quantitative measurements presented are consistent with our experiences of significantly reduced efforts for building warehousing platforms in close cooperation with medical researchers. Both the cloud-based hosting infrastructure and the data-loading pipeline are available to the community as open source software with comprehensive documentation. JMIR Publications 2020-07-21 /pmc/articles/PMC7404007/ /pubmed/32706673 http://dx.doi.org/10.2196/15918 Text en ©Helmut Spengler, Claudia Lang, Tanmaya Mahapatra, Ingrid Gatz, Klaus A Kuhn, Fabian Prasser. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 21.07.2020. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included. |
spellingShingle | Original Paper Spengler, Helmut Lang, Claudia Mahapatra, Tanmaya Gatz, Ingrid Kuhn, Klaus A Prasser, Fabian Enabling Agile Clinical and Translational Data Warehousing: Platform Development and Evaluation |
title | Enabling Agile Clinical and Translational Data Warehousing: Platform Development and Evaluation |
title_full | Enabling Agile Clinical and Translational Data Warehousing: Platform Development and Evaluation |
title_fullStr | Enabling Agile Clinical and Translational Data Warehousing: Platform Development and Evaluation |
title_full_unstemmed | Enabling Agile Clinical and Translational Data Warehousing: Platform Development and Evaluation |
title_short | Enabling Agile Clinical and Translational Data Warehousing: Platform Development and Evaluation |
title_sort | enabling agile clinical and translational data warehousing: platform development and evaluation |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7404007/ https://www.ncbi.nlm.nih.gov/pubmed/32706673 http://dx.doi.org/10.2196/15918 |
work_keys_str_mv | AT spenglerhelmut enablingagileclinicalandtranslationaldatawarehousingplatformdevelopmentandevaluation AT langclaudia enablingagileclinicalandtranslationaldatawarehousingplatformdevelopmentandevaluation AT mahapatratanmaya enablingagileclinicalandtranslationaldatawarehousingplatformdevelopmentandevaluation AT gatzingrid enablingagileclinicalandtranslationaldatawarehousingplatformdevelopmentandevaluation AT kuhnklausa enablingagileclinicalandtranslationaldatawarehousingplatformdevelopmentandevaluation AT prasserfabian enablingagileclinicalandtranslationaldatawarehousingplatformdevelopmentandevaluation |