Cargando…

FIBER: enabling flexible retrieval of electronic health records data for clinical predictive modeling

OBJECTIVES: The development of clinical predictive models hinges upon the availability of comprehensive clinical data. Tapping into such resources requires considerable effort from clinicians, data scientists, and engineers. Specifically, these efforts are focused on data extraction and preprocessin...

Descripción completa

Detalles Bibliográficos
Autores principales: Datta, Suparno, Sachs, Jan Philipp, FreitasDa Cruz, Harry, Martensen, Tom, Bode, Philipp, Morassi Sasso, Ariane, Glicksberg, Benjamin S, Böttinger, Erwin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8327378/
https://www.ncbi.nlm.nih.gov/pubmed/34350388
http://dx.doi.org/10.1093/jamiaopen/ooab048
_version_ 1783732062403952640
author Datta, Suparno
Sachs, Jan Philipp
FreitasDa Cruz, Harry
Martensen, Tom
Bode, Philipp
Morassi Sasso, Ariane
Glicksberg, Benjamin S
Böttinger, Erwin
author_facet Datta, Suparno
Sachs, Jan Philipp
FreitasDa Cruz, Harry
Martensen, Tom
Bode, Philipp
Morassi Sasso, Ariane
Glicksberg, Benjamin S
Böttinger, Erwin
author_sort Datta, Suparno
collection PubMed
description OBJECTIVES: The development of clinical predictive models hinges upon the availability of comprehensive clinical data. Tapping into such resources requires considerable effort from clinicians, data scientists, and engineers. Specifically, these efforts are focused on data extraction and preprocessing steps required prior to modeling, including complex database queries. A handful of software libraries exist that can reduce this complexity by building upon data standards. However, a gap remains concerning electronic health records (EHRs) stored in star schema clinical data warehouses, an approach often adopted in practice. In this article, we introduce the FlexIBle EHR Retrieval (FIBER) tool: a Python library built on top of a star schema (i2b2) clinical data warehouse that enables flexible generation of modeling-ready cohorts as data frames. MATERIALS AND METHODS: FIBER was developed on top of a large-scale star schema EHR database which contains data from 8 million patients and over 120 million encounters. To illustrate FIBER’s capabilities, we present its application by building a heart surgery patient cohort with subsequent prediction of acute kidney injury (AKI) with various machine learning models. RESULTS: Using FIBER, we were able to build the heart surgery cohort (n = 12 061), identify the patients that developed AKI (n = 1005), and automatically extract relevant features (n = 774). Finally, we trained machine learning models that achieved area under the curve values of up to 0.77 for this exemplary use case. CONCLUSION: FIBER is an open-source Python library developed for extracting information from star schema clinical data warehouses and reduces time-to-modeling, helping to streamline the clinical modeling process.
format Online
Article
Text
id pubmed-8327378
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-83273782021-08-03 FIBER: enabling flexible retrieval of electronic health records data for clinical predictive modeling Datta, Suparno Sachs, Jan Philipp FreitasDa Cruz, Harry Martensen, Tom Bode, Philipp Morassi Sasso, Ariane Glicksberg, Benjamin S Böttinger, Erwin JAMIA Open Research and Applications OBJECTIVES: The development of clinical predictive models hinges upon the availability of comprehensive clinical data. Tapping into such resources requires considerable effort from clinicians, data scientists, and engineers. Specifically, these efforts are focused on data extraction and preprocessing steps required prior to modeling, including complex database queries. A handful of software libraries exist that can reduce this complexity by building upon data standards. However, a gap remains concerning electronic health records (EHRs) stored in star schema clinical data warehouses, an approach often adopted in practice. In this article, we introduce the FlexIBle EHR Retrieval (FIBER) tool: a Python library built on top of a star schema (i2b2) clinical data warehouse that enables flexible generation of modeling-ready cohorts as data frames. MATERIALS AND METHODS: FIBER was developed on top of a large-scale star schema EHR database which contains data from 8 million patients and over 120 million encounters. To illustrate FIBER’s capabilities, we present its application by building a heart surgery patient cohort with subsequent prediction of acute kidney injury (AKI) with various machine learning models. RESULTS: Using FIBER, we were able to build the heart surgery cohort (n = 12 061), identify the patients that developed AKI (n = 1005), and automatically extract relevant features (n = 774). Finally, we trained machine learning models that achieved area under the curve values of up to 0.77 for this exemplary use case. CONCLUSION: FIBER is an open-source Python library developed for extracting information from star schema clinical data warehouses and reduces time-to-modeling, helping to streamline the clinical modeling process. Oxford University Press 2021-08-02 /pmc/articles/PMC8327378/ /pubmed/34350388 http://dx.doi.org/10.1093/jamiaopen/ooab048 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research and Applications
Datta, Suparno
Sachs, Jan Philipp
FreitasDa Cruz, Harry
Martensen, Tom
Bode, Philipp
Morassi Sasso, Ariane
Glicksberg, Benjamin S
Böttinger, Erwin
FIBER: enabling flexible retrieval of electronic health records data for clinical predictive modeling
title FIBER: enabling flexible retrieval of electronic health records data for clinical predictive modeling
title_full FIBER: enabling flexible retrieval of electronic health records data for clinical predictive modeling
title_fullStr FIBER: enabling flexible retrieval of electronic health records data for clinical predictive modeling
title_full_unstemmed FIBER: enabling flexible retrieval of electronic health records data for clinical predictive modeling
title_short FIBER: enabling flexible retrieval of electronic health records data for clinical predictive modeling
title_sort fiber: enabling flexible retrieval of electronic health records data for clinical predictive modeling
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8327378/
https://www.ncbi.nlm.nih.gov/pubmed/34350388
http://dx.doi.org/10.1093/jamiaopen/ooab048
work_keys_str_mv AT dattasuparno fiberenablingflexibleretrievalofelectronichealthrecordsdataforclinicalpredictivemodeling
AT sachsjanphilipp fiberenablingflexibleretrievalofelectronichealthrecordsdataforclinicalpredictivemodeling
AT freitasdacruzharry fiberenablingflexibleretrievalofelectronichealthrecordsdataforclinicalpredictivemodeling
AT martensentom fiberenablingflexibleretrievalofelectronichealthrecordsdataforclinicalpredictivemodeling
AT bodephilipp fiberenablingflexibleretrievalofelectronichealthrecordsdataforclinicalpredictivemodeling
AT morassisassoariane fiberenablingflexibleretrievalofelectronichealthrecordsdataforclinicalpredictivemodeling
AT glicksbergbenjamins fiberenablingflexibleretrievalofelectronichealthrecordsdataforclinicalpredictivemodeling
AT bottingererwin fiberenablingflexibleretrievalofelectronichealthrecordsdataforclinicalpredictivemodeling