Cargando…
FIBER: enabling flexible retrieval of electronic health records data for clinical predictive modeling
OBJECTIVES: The development of clinical predictive models hinges upon the availability of comprehensive clinical data. Tapping into such resources requires considerable effort from clinicians, data scientists, and engineers. Specifically, these efforts are focused on data extraction and preprocessin...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8327378/ https://www.ncbi.nlm.nih.gov/pubmed/34350388 http://dx.doi.org/10.1093/jamiaopen/ooab048 |
_version_ | 1783732062403952640 |
---|---|
author | Datta, Suparno Sachs, Jan Philipp FreitasDa Cruz, Harry Martensen, Tom Bode, Philipp Morassi Sasso, Ariane Glicksberg, Benjamin S Böttinger, Erwin |
author_facet | Datta, Suparno Sachs, Jan Philipp FreitasDa Cruz, Harry Martensen, Tom Bode, Philipp Morassi Sasso, Ariane Glicksberg, Benjamin S Böttinger, Erwin |
author_sort | Datta, Suparno |
collection | PubMed |
description | OBJECTIVES: The development of clinical predictive models hinges upon the availability of comprehensive clinical data. Tapping into such resources requires considerable effort from clinicians, data scientists, and engineers. Specifically, these efforts are focused on data extraction and preprocessing steps required prior to modeling, including complex database queries. A handful of software libraries exist that can reduce this complexity by building upon data standards. However, a gap remains concerning electronic health records (EHRs) stored in star schema clinical data warehouses, an approach often adopted in practice. In this article, we introduce the FlexIBle EHR Retrieval (FIBER) tool: a Python library built on top of a star schema (i2b2) clinical data warehouse that enables flexible generation of modeling-ready cohorts as data frames. MATERIALS AND METHODS: FIBER was developed on top of a large-scale star schema EHR database which contains data from 8 million patients and over 120 million encounters. To illustrate FIBER’s capabilities, we present its application by building a heart surgery patient cohort with subsequent prediction of acute kidney injury (AKI) with various machine learning models. RESULTS: Using FIBER, we were able to build the heart surgery cohort (n = 12 061), identify the patients that developed AKI (n = 1005), and automatically extract relevant features (n = 774). Finally, we trained machine learning models that achieved area under the curve values of up to 0.77 for this exemplary use case. CONCLUSION: FIBER is an open-source Python library developed for extracting information from star schema clinical data warehouses and reduces time-to-modeling, helping to streamline the clinical modeling process. |
format | Online Article Text |
id | pubmed-8327378 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-83273782021-08-03 FIBER: enabling flexible retrieval of electronic health records data for clinical predictive modeling Datta, Suparno Sachs, Jan Philipp FreitasDa Cruz, Harry Martensen, Tom Bode, Philipp Morassi Sasso, Ariane Glicksberg, Benjamin S Böttinger, Erwin JAMIA Open Research and Applications OBJECTIVES: The development of clinical predictive models hinges upon the availability of comprehensive clinical data. Tapping into such resources requires considerable effort from clinicians, data scientists, and engineers. Specifically, these efforts are focused on data extraction and preprocessing steps required prior to modeling, including complex database queries. A handful of software libraries exist that can reduce this complexity by building upon data standards. However, a gap remains concerning electronic health records (EHRs) stored in star schema clinical data warehouses, an approach often adopted in practice. In this article, we introduce the FlexIBle EHR Retrieval (FIBER) tool: a Python library built on top of a star schema (i2b2) clinical data warehouse that enables flexible generation of modeling-ready cohorts as data frames. MATERIALS AND METHODS: FIBER was developed on top of a large-scale star schema EHR database which contains data from 8 million patients and over 120 million encounters. To illustrate FIBER’s capabilities, we present its application by building a heart surgery patient cohort with subsequent prediction of acute kidney injury (AKI) with various machine learning models. RESULTS: Using FIBER, we were able to build the heart surgery cohort (n = 12 061), identify the patients that developed AKI (n = 1005), and automatically extract relevant features (n = 774). Finally, we trained machine learning models that achieved area under the curve values of up to 0.77 for this exemplary use case. CONCLUSION: FIBER is an open-source Python library developed for extracting information from star schema clinical data warehouses and reduces time-to-modeling, helping to streamline the clinical modeling process. Oxford University Press 2021-08-02 /pmc/articles/PMC8327378/ /pubmed/34350388 http://dx.doi.org/10.1093/jamiaopen/ooab048 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research and Applications Datta, Suparno Sachs, Jan Philipp FreitasDa Cruz, Harry Martensen, Tom Bode, Philipp Morassi Sasso, Ariane Glicksberg, Benjamin S Böttinger, Erwin FIBER: enabling flexible retrieval of electronic health records data for clinical predictive modeling |
title | FIBER: enabling flexible retrieval of electronic health records data for clinical predictive modeling |
title_full | FIBER: enabling flexible retrieval of electronic health records data for clinical predictive modeling |
title_fullStr | FIBER: enabling flexible retrieval of electronic health records data for clinical predictive modeling |
title_full_unstemmed | FIBER: enabling flexible retrieval of electronic health records data for clinical predictive modeling |
title_short | FIBER: enabling flexible retrieval of electronic health records data for clinical predictive modeling |
title_sort | fiber: enabling flexible retrieval of electronic health records data for clinical predictive modeling |
topic | Research and Applications |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8327378/ https://www.ncbi.nlm.nih.gov/pubmed/34350388 http://dx.doi.org/10.1093/jamiaopen/ooab048 |
work_keys_str_mv | AT dattasuparno fiberenablingflexibleretrievalofelectronichealthrecordsdataforclinicalpredictivemodeling AT sachsjanphilipp fiberenablingflexibleretrievalofelectronichealthrecordsdataforclinicalpredictivemodeling AT freitasdacruzharry fiberenablingflexibleretrievalofelectronichealthrecordsdataforclinicalpredictivemodeling AT martensentom fiberenablingflexibleretrievalofelectronichealthrecordsdataforclinicalpredictivemodeling AT bodephilipp fiberenablingflexibleretrievalofelectronichealthrecordsdataforclinicalpredictivemodeling AT morassisassoariane fiberenablingflexibleretrievalofelectronichealthrecordsdataforclinicalpredictivemodeling AT glicksbergbenjamins fiberenablingflexibleretrievalofelectronichealthrecordsdataforclinicalpredictivemodeling AT bottingererwin fiberenablingflexibleretrievalofelectronichealthrecordsdataforclinicalpredictivemodeling |