Cargando…

An Extract-Transform-Load Process Design for the Incremental Loading of German Real-World Data Based on FHIR and OMOP CDM: Algorithm Development and Validation

BACKGROUND: In the Medical Informatics in Research and Care in University Medicine (MIRACUM) consortium, an IT-based clinical trial recruitment support system was developed based on the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). Currently, OMOP CDM is populated with G...

Descripción completa

Detalles Bibliográficos
Autores principales: Henke, Elisa, Peng, Yuan, Reinecke, Ines, Zoch, Michéle, Sedlmayr, Martin, Bathelt, Franziska
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10466444/
https://www.ncbi.nlm.nih.gov/pubmed/37621207
http://dx.doi.org/10.2196/47310
_version_ 1785098885217124352
author Henke, Elisa
Peng, Yuan
Reinecke, Ines
Zoch, Michéle
Sedlmayr, Martin
Bathelt, Franziska
author_facet Henke, Elisa
Peng, Yuan
Reinecke, Ines
Zoch, Michéle
Sedlmayr, Martin
Bathelt, Franziska
author_sort Henke, Elisa
collection PubMed
description BACKGROUND: In the Medical Informatics in Research and Care in University Medicine (MIRACUM) consortium, an IT-based clinical trial recruitment support system was developed based on the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). Currently, OMOP CDM is populated with German Fast Healthcare Interoperability Resources (FHIR) using an Extract-Transform-Load (ETL) process, which was designed as a bulk load. However, the computational effort that comes with an everyday full load is not efficient for daily recruitment. OBJECTIVE: The aim of this study is to extend our existing ETL process with the option of incremental loading to efficiently support daily updated data. METHODS: Based on our existing bulk ETL process, we performed an analysis to determine the requirements of incremental loading. Furthermore, a literature review was conducted to identify adaptable approaches. Based on this, we implemented three methods to integrate incremental loading into our ETL process. Lastly, a test suite was defined to evaluate the incremental loading for data correctness and performance compared to bulk loading. RESULTS: The resulting ETL process supports bulk and incremental loading. Performance tests show that the incremental load took 87.5% less execution time than the bulk load (2.12 min compared to 17.07 min) related to changes of 1 day, while no data differences occurred in OMOP CDM. CONCLUSIONS: Since incremental loading is more efficient than a daily bulk load and both loading options result in the same amount of data, we recommend using bulk load for an initial load and switching to incremental load for daily updates. The resulting incremental ETL logic can be applied internationally since it is not restricted to German FHIR profiles.
format Online
Article
Text
id pubmed-10466444
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-104664442023-08-31 An Extract-Transform-Load Process Design for the Incremental Loading of German Real-World Data Based on FHIR and OMOP CDM: Algorithm Development and Validation Henke, Elisa Peng, Yuan Reinecke, Ines Zoch, Michéle Sedlmayr, Martin Bathelt, Franziska JMIR Med Inform Original Paper BACKGROUND: In the Medical Informatics in Research and Care in University Medicine (MIRACUM) consortium, an IT-based clinical trial recruitment support system was developed based on the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). Currently, OMOP CDM is populated with German Fast Healthcare Interoperability Resources (FHIR) using an Extract-Transform-Load (ETL) process, which was designed as a bulk load. However, the computational effort that comes with an everyday full load is not efficient for daily recruitment. OBJECTIVE: The aim of this study is to extend our existing ETL process with the option of incremental loading to efficiently support daily updated data. METHODS: Based on our existing bulk ETL process, we performed an analysis to determine the requirements of incremental loading. Furthermore, a literature review was conducted to identify adaptable approaches. Based on this, we implemented three methods to integrate incremental loading into our ETL process. Lastly, a test suite was defined to evaluate the incremental loading for data correctness and performance compared to bulk loading. RESULTS: The resulting ETL process supports bulk and incremental loading. Performance tests show that the incremental load took 87.5% less execution time than the bulk load (2.12 min compared to 17.07 min) related to changes of 1 day, while no data differences occurred in OMOP CDM. CONCLUSIONS: Since incremental loading is more efficient than a daily bulk load and both loading options result in the same amount of data, we recommend using bulk load for an initial load and switching to incremental load for daily updates. The resulting incremental ETL logic can be applied internationally since it is not restricted to German FHIR profiles. JMIR Publications 2023-08-21 /pmc/articles/PMC10466444/ /pubmed/37621207 http://dx.doi.org/10.2196/47310 Text en © Elisa Henke, Yuan Peng, Ines Reinecke, Michéle Zoch, Martin Sedlmayr, Franziska Bathelt. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 21.8.2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Henke, Elisa
Peng, Yuan
Reinecke, Ines
Zoch, Michéle
Sedlmayr, Martin
Bathelt, Franziska
An Extract-Transform-Load Process Design for the Incremental Loading of German Real-World Data Based on FHIR and OMOP CDM: Algorithm Development and Validation
title An Extract-Transform-Load Process Design for the Incremental Loading of German Real-World Data Based on FHIR and OMOP CDM: Algorithm Development and Validation
title_full An Extract-Transform-Load Process Design for the Incremental Loading of German Real-World Data Based on FHIR and OMOP CDM: Algorithm Development and Validation
title_fullStr An Extract-Transform-Load Process Design for the Incremental Loading of German Real-World Data Based on FHIR and OMOP CDM: Algorithm Development and Validation
title_full_unstemmed An Extract-Transform-Load Process Design for the Incremental Loading of German Real-World Data Based on FHIR and OMOP CDM: Algorithm Development and Validation
title_short An Extract-Transform-Load Process Design for the Incremental Loading of German Real-World Data Based on FHIR and OMOP CDM: Algorithm Development and Validation
title_sort extract-transform-load process design for the incremental loading of german real-world data based on fhir and omop cdm: algorithm development and validation
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10466444/
https://www.ncbi.nlm.nih.gov/pubmed/37621207
http://dx.doi.org/10.2196/47310
work_keys_str_mv AT henkeelisa anextracttransformloadprocessdesignfortheincrementalloadingofgermanrealworlddatabasedonfhirandomopcdmalgorithmdevelopmentandvalidation
AT pengyuan anextracttransformloadprocessdesignfortheincrementalloadingofgermanrealworlddatabasedonfhirandomopcdmalgorithmdevelopmentandvalidation
AT reineckeines anextracttransformloadprocessdesignfortheincrementalloadingofgermanrealworlddatabasedonfhirandomopcdmalgorithmdevelopmentandvalidation
AT zochmichele anextracttransformloadprocessdesignfortheincrementalloadingofgermanrealworlddatabasedonfhirandomopcdmalgorithmdevelopmentandvalidation
AT sedlmayrmartin anextracttransformloadprocessdesignfortheincrementalloadingofgermanrealworlddatabasedonfhirandomopcdmalgorithmdevelopmentandvalidation
AT batheltfranziska anextracttransformloadprocessdesignfortheincrementalloadingofgermanrealworlddatabasedonfhirandomopcdmalgorithmdevelopmentandvalidation
AT henkeelisa extracttransformloadprocessdesignfortheincrementalloadingofgermanrealworlddatabasedonfhirandomopcdmalgorithmdevelopmentandvalidation
AT pengyuan extracttransformloadprocessdesignfortheincrementalloadingofgermanrealworlddatabasedonfhirandomopcdmalgorithmdevelopmentandvalidation
AT reineckeines extracttransformloadprocessdesignfortheincrementalloadingofgermanrealworlddatabasedonfhirandomopcdmalgorithmdevelopmentandvalidation
AT zochmichele extracttransformloadprocessdesignfortheincrementalloadingofgermanrealworlddatabasedonfhirandomopcdmalgorithmdevelopmentandvalidation
AT sedlmayrmartin extracttransformloadprocessdesignfortheincrementalloadingofgermanrealworlddatabasedonfhirandomopcdmalgorithmdevelopmentandvalidation
AT batheltfranziska extracttransformloadprocessdesignfortheincrementalloadingofgermanrealworlddatabasedonfhirandomopcdmalgorithmdevelopmentandvalidation