Cargando…

Dynamic-ETL: a hybrid approach for health data extraction, transformation and loading

BACKGROUND: Electronic health records (EHRs) contain detailed clinical data stored in proprietary formats with non-standard codes and structures. Participating in multi-site clinical research networks requires EHR data to be restructured and transformed into a common format and standard terminologie...

Descripción completa

Detalles Bibliográficos
Autores principales: Ong, Toan C., Kahn, Michael G., Kwan, Bethany M., Yamashita, Traci, Brandt, Elias, Hosokawa, Patrick, Uhrich, Chris, Schilling, Lisa M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5598056/
https://www.ncbi.nlm.nih.gov/pubmed/28903729
http://dx.doi.org/10.1186/s12911-017-0532-3
_version_ 1783263824320659456
author Ong, Toan C.
Kahn, Michael G.
Kwan, Bethany M.
Yamashita, Traci
Brandt, Elias
Hosokawa, Patrick
Uhrich, Chris
Schilling, Lisa M.
author_facet Ong, Toan C.
Kahn, Michael G.
Kwan, Bethany M.
Yamashita, Traci
Brandt, Elias
Hosokawa, Patrick
Uhrich, Chris
Schilling, Lisa M.
author_sort Ong, Toan C.
collection PubMed
description BACKGROUND: Electronic health records (EHRs) contain detailed clinical data stored in proprietary formats with non-standard codes and structures. Participating in multi-site clinical research networks requires EHR data to be restructured and transformed into a common format and standard terminologies, and optimally linked to other data sources. The expertise and scalable solutions needed to transform data to conform to network requirements are beyond the scope of many health care organizations and there is a need for practical tools that lower the barriers of data contribution to clinical research networks. METHODS: We designed and implemented a health data transformation and loading approach, which we refer to as Dynamic ETL (Extraction, Transformation and Loading) (D-ETL), that automates part of the process through use of scalable, reusable and customizable code, while retaining manual aspects of the process that requires knowledge of complex coding syntax. This approach provides the flexibility required for the ETL of heterogeneous data, variations in semantic expertise, and transparency of transformation logic that are essential to implement ETL conventions across clinical research sharing networks. Processing workflows are directed by the ETL specifications guideline, developed by ETL designers with extensive knowledge of the structure and semantics of health data (i.e., “health data domain experts”) and target common data model. RESULTS: D-ETL was implemented to perform ETL operations that load data from various sources with different database schema structures into the Observational Medical Outcome Partnership (OMOP) common data model. The results showed that ETL rule composition methods and the D-ETL engine offer a scalable solution for health data transformation via automatic query generation to harmonize source datasets. CONCLUSIONS: D-ETL supports a flexible and transparent process to transform and load health data into a target data model. This approach offers a solution that lowers technical barriers that prevent data partners from participating in research data networks, and therefore, promotes the advancement of comparative effectiveness research using secondary electronic health data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12911-017-0532-3) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5598056
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-55980562017-09-18 Dynamic-ETL: a hybrid approach for health data extraction, transformation and loading Ong, Toan C. Kahn, Michael G. Kwan, Bethany M. Yamashita, Traci Brandt, Elias Hosokawa, Patrick Uhrich, Chris Schilling, Lisa M. BMC Med Inform Decis Mak Technical Advance BACKGROUND: Electronic health records (EHRs) contain detailed clinical data stored in proprietary formats with non-standard codes and structures. Participating in multi-site clinical research networks requires EHR data to be restructured and transformed into a common format and standard terminologies, and optimally linked to other data sources. The expertise and scalable solutions needed to transform data to conform to network requirements are beyond the scope of many health care organizations and there is a need for practical tools that lower the barriers of data contribution to clinical research networks. METHODS: We designed and implemented a health data transformation and loading approach, which we refer to as Dynamic ETL (Extraction, Transformation and Loading) (D-ETL), that automates part of the process through use of scalable, reusable and customizable code, while retaining manual aspects of the process that requires knowledge of complex coding syntax. This approach provides the flexibility required for the ETL of heterogeneous data, variations in semantic expertise, and transparency of transformation logic that are essential to implement ETL conventions across clinical research sharing networks. Processing workflows are directed by the ETL specifications guideline, developed by ETL designers with extensive knowledge of the structure and semantics of health data (i.e., “health data domain experts”) and target common data model. RESULTS: D-ETL was implemented to perform ETL operations that load data from various sources with different database schema structures into the Observational Medical Outcome Partnership (OMOP) common data model. The results showed that ETL rule composition methods and the D-ETL engine offer a scalable solution for health data transformation via automatic query generation to harmonize source datasets. CONCLUSIONS: D-ETL supports a flexible and transparent process to transform and load health data into a target data model. This approach offers a solution that lowers technical barriers that prevent data partners from participating in research data networks, and therefore, promotes the advancement of comparative effectiveness research using secondary electronic health data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12911-017-0532-3) contains supplementary material, which is available to authorized users. BioMed Central 2017-09-13 /pmc/articles/PMC5598056/ /pubmed/28903729 http://dx.doi.org/10.1186/s12911-017-0532-3 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Technical Advance
Ong, Toan C.
Kahn, Michael G.
Kwan, Bethany M.
Yamashita, Traci
Brandt, Elias
Hosokawa, Patrick
Uhrich, Chris
Schilling, Lisa M.
Dynamic-ETL: a hybrid approach for health data extraction, transformation and loading
title Dynamic-ETL: a hybrid approach for health data extraction, transformation and loading
title_full Dynamic-ETL: a hybrid approach for health data extraction, transformation and loading
title_fullStr Dynamic-ETL: a hybrid approach for health data extraction, transformation and loading
title_full_unstemmed Dynamic-ETL: a hybrid approach for health data extraction, transformation and loading
title_short Dynamic-ETL: a hybrid approach for health data extraction, transformation and loading
title_sort dynamic-etl: a hybrid approach for health data extraction, transformation and loading
topic Technical Advance
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5598056/
https://www.ncbi.nlm.nih.gov/pubmed/28903729
http://dx.doi.org/10.1186/s12911-017-0532-3
work_keys_str_mv AT ongtoanc dynamicetlahybridapproachforhealthdataextractiontransformationandloading
AT kahnmichaelg dynamicetlahybridapproachforhealthdataextractiontransformationandloading
AT kwanbethanym dynamicetlahybridapproachforhealthdataextractiontransformationandloading
AT yamashitatraci dynamicetlahybridapproachforhealthdataextractiontransformationandloading
AT brandtelias dynamicetlahybridapproachforhealthdataextractiontransformationandloading
AT hosokawapatrick dynamicetlahybridapproachforhealthdataextractiontransformationandloading
AT uhrichchris dynamicetlahybridapproachforhealthdataextractiontransformationandloading
AT schillinglisam dynamicetlahybridapproachforhealthdataextractiontransformationandloading