Cargando…

TransformEHRs: a flexible methodology for building transparent ETL processes for EHR reuse

Background  During the COVID-19 pandemic, several methodologies were designed for obtaining electronic health record (EHR)-derived datasets for research. These processes are often based on black boxes, on which clinical researchers are unaware of how the data were recorded, extracted, and transforme...

Descripción completa

Detalles Bibliográficos
Autores principales: Pedrera-Jiménez, Miguel, García-Barrio, Noelia, Rubio-Mayo, Paula, Tato-Gómez, Alberto, Cruz-Bermúdez, Juan Luis, Bernal-Sobrino, José Luis, Muñoz-Carrero, Adolfo, Serrano-Balazote, Pablo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Georg Thieme Verlag KG 2022
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9788916/
https://www.ncbi.nlm.nih.gov/pubmed/36220109
http://dx.doi.org/10.1055/s-0042-1757763
_version_ 1784858860231589888
author Pedrera-Jiménez, Miguel
García-Barrio, Noelia
Rubio-Mayo, Paula
Tato-Gómez, Alberto
Cruz-Bermúdez, Juan Luis
Bernal-Sobrino, José Luis
Muñoz-Carrero, Adolfo
Serrano-Balazote, Pablo
author_facet Pedrera-Jiménez, Miguel
García-Barrio, Noelia
Rubio-Mayo, Paula
Tato-Gómez, Alberto
Cruz-Bermúdez, Juan Luis
Bernal-Sobrino, José Luis
Muñoz-Carrero, Adolfo
Serrano-Balazote, Pablo
author_sort Pedrera-Jiménez, Miguel
collection PubMed
description Background  During the COVID-19 pandemic, several methodologies were designed for obtaining electronic health record (EHR)-derived datasets for research. These processes are often based on black boxes, on which clinical researchers are unaware of how the data were recorded, extracted, and transformed. In order to solve this, it is essential that extract, transform, and load (ETL) processes are based on transparent, homogeneous, and formal methodologies, making them understandable, reproducible, and auditable. Objectives  This study aims to design and implement a methodology, according with FAIR Principles, for building ETL processes (focused on data extraction, selection, and transformation) for EHR reuse in a transparent and flexible manner, applicable to any clinical condition and health care organization. Methods  The proposed methodology comprises four stages: (1) analysis of secondary use models and identification of data operations, based on internationally used clinical repositories, case report forms, and aggregated datasets; (2) modeling and formalization of data operations, through the paradigm of the Detailed Clinical Models; (3) agnostic development of data operations, selecting SQL and R as programming languages; and (4) automation of the ETL instantiation, building a formal configuration file with XML. Results  First, four international projects were analyzed to identify 17 operations, necessary to obtain datasets according to the specifications of these projects from the EHR. With this, each of the data operations was formalized, using the ISO 13606 reference model, specifying the valid data types as arguments, inputs and outputs, and their cardinality. Then, an agnostic catalog of data was developed through data-oriented programming languages previously selected. Finally, an automated ETL instantiation process was built from an ETL configuration file formally defined. Conclusions  This study has provided a transparent and flexible solution to the difficulty of making the processes for obtaining EHR-derived data for secondary use understandable, auditable, and reproducible. Moreover, the abstraction carried out in this study means that any previous EHR reuse methodology can incorporate these results into them.
format Online
Article
Text
id pubmed-9788916
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Georg Thieme Verlag KG
record_format MEDLINE/PubMed
spelling pubmed-97889162022-12-24 TransformEHRs: a flexible methodology for building transparent ETL processes for EHR reuse Pedrera-Jiménez, Miguel García-Barrio, Noelia Rubio-Mayo, Paula Tato-Gómez, Alberto Cruz-Bermúdez, Juan Luis Bernal-Sobrino, José Luis Muñoz-Carrero, Adolfo Serrano-Balazote, Pablo Methods Inf Med Background  During the COVID-19 pandemic, several methodologies were designed for obtaining electronic health record (EHR)-derived datasets for research. These processes are often based on black boxes, on which clinical researchers are unaware of how the data were recorded, extracted, and transformed. In order to solve this, it is essential that extract, transform, and load (ETL) processes are based on transparent, homogeneous, and formal methodologies, making them understandable, reproducible, and auditable. Objectives  This study aims to design and implement a methodology, according with FAIR Principles, for building ETL processes (focused on data extraction, selection, and transformation) for EHR reuse in a transparent and flexible manner, applicable to any clinical condition and health care organization. Methods  The proposed methodology comprises four stages: (1) analysis of secondary use models and identification of data operations, based on internationally used clinical repositories, case report forms, and aggregated datasets; (2) modeling and formalization of data operations, through the paradigm of the Detailed Clinical Models; (3) agnostic development of data operations, selecting SQL and R as programming languages; and (4) automation of the ETL instantiation, building a formal configuration file with XML. Results  First, four international projects were analyzed to identify 17 operations, necessary to obtain datasets according to the specifications of these projects from the EHR. With this, each of the data operations was formalized, using the ISO 13606 reference model, specifying the valid data types as arguments, inputs and outputs, and their cardinality. Then, an agnostic catalog of data was developed through data-oriented programming languages previously selected. Finally, an automated ETL instantiation process was built from an ETL configuration file formally defined. Conclusions  This study has provided a transparent and flexible solution to the difficulty of making the processes for obtaining EHR-derived data for secondary use understandable, auditable, and reproducible. Moreover, the abstraction carried out in this study means that any previous EHR reuse methodology can incorporate these results into them. Georg Thieme Verlag KG 2022-10-11 /pmc/articles/PMC9788916/ /pubmed/36220109 http://dx.doi.org/10.1055/s-0042-1757763 Text en The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. ( https://creativecommons.org/licenses/by-nc-nd/4.0/ ) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License, which permits unrestricted reproduction and distribution, for non-commercial purposes only; and use and reproduction, but not distribution, of adapted material for non-commercial purposes only, provided the original work is properly cited.
spellingShingle Pedrera-Jiménez, Miguel
García-Barrio, Noelia
Rubio-Mayo, Paula
Tato-Gómez, Alberto
Cruz-Bermúdez, Juan Luis
Bernal-Sobrino, José Luis
Muñoz-Carrero, Adolfo
Serrano-Balazote, Pablo
TransformEHRs: a flexible methodology for building transparent ETL processes for EHR reuse
title TransformEHRs: a flexible methodology for building transparent ETL processes for EHR reuse
title_full TransformEHRs: a flexible methodology for building transparent ETL processes for EHR reuse
title_fullStr TransformEHRs: a flexible methodology for building transparent ETL processes for EHR reuse
title_full_unstemmed TransformEHRs: a flexible methodology for building transparent ETL processes for EHR reuse
title_short TransformEHRs: a flexible methodology for building transparent ETL processes for EHR reuse
title_sort transformehrs: a flexible methodology for building transparent etl processes for ehr reuse
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9788916/
https://www.ncbi.nlm.nih.gov/pubmed/36220109
http://dx.doi.org/10.1055/s-0042-1757763
work_keys_str_mv AT pedrerajimenezmiguel transformehrsaflexiblemethodologyforbuildingtransparentetlprocessesforehrreuse
AT garciabarrionoelia transformehrsaflexiblemethodologyforbuildingtransparentetlprocessesforehrreuse
AT rubiomayopaula transformehrsaflexiblemethodologyforbuildingtransparentetlprocessesforehrreuse
AT tatogomezalberto transformehrsaflexiblemethodologyforbuildingtransparentetlprocessesforehrreuse
AT cruzbermudezjuanluis transformehrsaflexiblemethodologyforbuildingtransparentetlprocessesforehrreuse
AT bernalsobrinojoseluis transformehrsaflexiblemethodologyforbuildingtransparentetlprocessesforehrreuse
AT munozcarreroadolfo transformehrsaflexiblemethodologyforbuildingtransparentetlprocessesforehrreuse
AT serranobalazotepablo transformehrsaflexiblemethodologyforbuildingtransparentetlprocessesforehrreuse