Cargando…

Extract, transform, load framework for the conversion of health databases to OMOP

Common data models standardize the structures and semantics of health datasets, enabling reproducibility and large-scale studies that leverage the data from multiple locations and settings. The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) is one of the leading common data...

Descripción completa

Detalles Bibliográficos
Autores principales: Quiroz, Juan C., Chard, Tim, Sa, Zhisheng, Ritchie, Angus, Jorm, Louisa, Gallego, Blanca
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9000122/
https://www.ncbi.nlm.nih.gov/pubmed/35404974
http://dx.doi.org/10.1371/journal.pone.0266911
_version_ 1784685357659324416
author Quiroz, Juan C.
Chard, Tim
Sa, Zhisheng
Ritchie, Angus
Jorm, Louisa
Gallego, Blanca
author_facet Quiroz, Juan C.
Chard, Tim
Sa, Zhisheng
Ritchie, Angus
Jorm, Louisa
Gallego, Blanca
author_sort Quiroz, Juan C.
collection PubMed
description Common data models standardize the structures and semantics of health datasets, enabling reproducibility and large-scale studies that leverage the data from multiple locations and settings. The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) is one of the leading common data models. While there is a strong incentive to convert datasets to OMOP, the conversion is time and resource-intensive, leaving the research community in need of tools for mapping data to OMOP. We propose an extract, transform, load (ETL) framework that is metadata-driven and generic across source datasets. The ETL framework uses a new data manipulation language (DML) that organizes SQL snippets in YAML. Our framework includes a compiler that converts YAML files with mapping logic into an ETL script. Access to the ETL framework is available via a web application, allowing users to upload and edit YAML files via web editor and obtain an ETL SQL script for use in development environments. The structure of the DML maximizes readability, refactoring, and maintainability, while minimizing technical debt and standardizing the writing of ETL operations for mapping to OMOP. Our framework also supports transparency of the mapping process and reuse by different institutions.
format Online
Article
Text
id pubmed-9000122
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-90001222022-04-12 Extract, transform, load framework for the conversion of health databases to OMOP Quiroz, Juan C. Chard, Tim Sa, Zhisheng Ritchie, Angus Jorm, Louisa Gallego, Blanca PLoS One Research Article Common data models standardize the structures and semantics of health datasets, enabling reproducibility and large-scale studies that leverage the data from multiple locations and settings. The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) is one of the leading common data models. While there is a strong incentive to convert datasets to OMOP, the conversion is time and resource-intensive, leaving the research community in need of tools for mapping data to OMOP. We propose an extract, transform, load (ETL) framework that is metadata-driven and generic across source datasets. The ETL framework uses a new data manipulation language (DML) that organizes SQL snippets in YAML. Our framework includes a compiler that converts YAML files with mapping logic into an ETL script. Access to the ETL framework is available via a web application, allowing users to upload and edit YAML files via web editor and obtain an ETL SQL script for use in development environments. The structure of the DML maximizes readability, refactoring, and maintainability, while minimizing technical debt and standardizing the writing of ETL operations for mapping to OMOP. Our framework also supports transparency of the mapping process and reuse by different institutions. Public Library of Science 2022-04-11 /pmc/articles/PMC9000122/ /pubmed/35404974 http://dx.doi.org/10.1371/journal.pone.0266911 Text en © 2022 Quiroz et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Quiroz, Juan C.
Chard, Tim
Sa, Zhisheng
Ritchie, Angus
Jorm, Louisa
Gallego, Blanca
Extract, transform, load framework for the conversion of health databases to OMOP
title Extract, transform, load framework for the conversion of health databases to OMOP
title_full Extract, transform, load framework for the conversion of health databases to OMOP
title_fullStr Extract, transform, load framework for the conversion of health databases to OMOP
title_full_unstemmed Extract, transform, load framework for the conversion of health databases to OMOP
title_short Extract, transform, load framework for the conversion of health databases to OMOP
title_sort extract, transform, load framework for the conversion of health databases to omop
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9000122/
https://www.ncbi.nlm.nih.gov/pubmed/35404974
http://dx.doi.org/10.1371/journal.pone.0266911
work_keys_str_mv AT quirozjuanc extracttransformloadframeworkfortheconversionofhealthdatabasestoomop
AT chardtim extracttransformloadframeworkfortheconversionofhealthdatabasestoomop
AT sazhisheng extracttransformloadframeworkfortheconversionofhealthdatabasestoomop
AT ritchieangus extracttransformloadframeworkfortheconversionofhealthdatabasestoomop
AT jormlouisa extracttransformloadframeworkfortheconversionofhealthdatabasestoomop
AT gallegoblanca extracttransformloadframeworkfortheconversionofhealthdatabasestoomop