Cargando…

Democratizing EHR analyses with FIDDLE: a flexible data-driven preprocessing pipeline for structured clinical data

OBJECTIVE: In applying machine learning (ML) to electronic health record (EHR) data, many decisions must be made before any ML is applied; such preprocessing requires substantial effort and can be labor-intensive. As the role of ML in health care grows, there is an increasing need for systematic and...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tang, Shengpu, Davarmanesh, Parmida, Song, Yanmeng, Koutra, Danai, Sjoding, Michael W, Wiens, Jenna
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2020
Materias:	Research and Applications
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7727385/ https://www.ncbi.nlm.nih.gov/pubmed/33040151 http://dx.doi.org/10.1093/jamia/ocaa139

_version_	1783621082489552896
author	Tang, Shengpu Davarmanesh, Parmida Song, Yanmeng Koutra, Danai Sjoding, Michael W Wiens, Jenna
author_facet	Tang, Shengpu Davarmanesh, Parmida Song, Yanmeng Koutra, Danai Sjoding, Michael W Wiens, Jenna
author_sort	Tang, Shengpu
collection	PubMed
description	OBJECTIVE: In applying machine learning (ML) to electronic health record (EHR) data, many decisions must be made before any ML is applied; such preprocessing requires substantial effort and can be labor-intensive. As the role of ML in health care grows, there is an increasing need for systematic and reproducible preprocessing techniques for EHR data. Thus, we developed FIDDLE (Flexible Data-Driven Pipeline), an open-source framework that streamlines the preprocessing of data extracted from the EHR. MATERIALS AND METHODS: Largely data-driven, FIDDLE systematically transforms structured EHR data into feature vectors, limiting the number of decisions a user must make while incorporating good practices from the literature. To demonstrate its utility and flexibility, we conducted a proof-of-concept experiment in which we applied FIDDLE to 2 publicly available EHR data sets collected from intensive care units: MIMIC-III and the eICU Collaborative Research Database. We trained different ML models to predict 3 clinically important outcomes: in-hospital mortality, acute respiratory failure, and shock. We evaluated models using the area under the receiver operating characteristics curve (AUROC), and compared it to several baselines. RESULTS: Across tasks, FIDDLE extracted 2,528 to 7,403 features from MIMIC-III and eICU, respectively. On all tasks, FIDDLE-based models achieved good discriminative performance, with AUROCs of 0.757–0.886, comparable to the performance of MIMIC-Extract, a preprocessing pipeline designed specifically for MIMIC-III. Furthermore, our results showed that FIDDLE is generalizable across different prediction times, ML algorithms, and data sets, while being relatively robust to different settings of user-defined arguments. CONCLUSIONS: FIDDLE, an open-source preprocessing pipeline, facilitates applying ML to structured EHR data. By accelerating and standardizing labor-intensive preprocessing, FIDDLE can help stimulate progress in building clinically useful ML tools for EHR data.
format	Online Article Text
id	pubmed-7727385
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-77273852020-12-16 Democratizing EHR analyses with FIDDLE: a flexible data-driven preprocessing pipeline for structured clinical data Tang, Shengpu Davarmanesh, Parmida Song, Yanmeng Koutra, Danai Sjoding, Michael W Wiens, Jenna J Am Med Inform Assoc Research and Applications OBJECTIVE: In applying machine learning (ML) to electronic health record (EHR) data, many decisions must be made before any ML is applied; such preprocessing requires substantial effort and can be labor-intensive. As the role of ML in health care grows, there is an increasing need for systematic and reproducible preprocessing techniques for EHR data. Thus, we developed FIDDLE (Flexible Data-Driven Pipeline), an open-source framework that streamlines the preprocessing of data extracted from the EHR. MATERIALS AND METHODS: Largely data-driven, FIDDLE systematically transforms structured EHR data into feature vectors, limiting the number of decisions a user must make while incorporating good practices from the literature. To demonstrate its utility and flexibility, we conducted a proof-of-concept experiment in which we applied FIDDLE to 2 publicly available EHR data sets collected from intensive care units: MIMIC-III and the eICU Collaborative Research Database. We trained different ML models to predict 3 clinically important outcomes: in-hospital mortality, acute respiratory failure, and shock. We evaluated models using the area under the receiver operating characteristics curve (AUROC), and compared it to several baselines. RESULTS: Across tasks, FIDDLE extracted 2,528 to 7,403 features from MIMIC-III and eICU, respectively. On all tasks, FIDDLE-based models achieved good discriminative performance, with AUROCs of 0.757–0.886, comparable to the performance of MIMIC-Extract, a preprocessing pipeline designed specifically for MIMIC-III. Furthermore, our results showed that FIDDLE is generalizable across different prediction times, ML algorithms, and data sets, while being relatively robust to different settings of user-defined arguments. CONCLUSIONS: FIDDLE, an open-source preprocessing pipeline, facilitates applying ML to structured EHR data. By accelerating and standardizing labor-intensive preprocessing, FIDDLE can help stimulate progress in building clinically useful ML tools for EHR data. Oxford University Press 2020-10-11 /pmc/articles/PMC7727385/ /pubmed/33040151 http://dx.doi.org/10.1093/jamia/ocaa139 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Research and Applications Tang, Shengpu Davarmanesh, Parmida Song, Yanmeng Koutra, Danai Sjoding, Michael W Wiens, Jenna Democratizing EHR analyses with FIDDLE: a flexible data-driven preprocessing pipeline for structured clinical data
title	Democratizing EHR analyses with FIDDLE: a flexible data-driven preprocessing pipeline for structured clinical data
title_full	Democratizing EHR analyses with FIDDLE: a flexible data-driven preprocessing pipeline for structured clinical data
title_fullStr	Democratizing EHR analyses with FIDDLE: a flexible data-driven preprocessing pipeline for structured clinical data
title_full_unstemmed	Democratizing EHR analyses with FIDDLE: a flexible data-driven preprocessing pipeline for structured clinical data
title_short	Democratizing EHR analyses with FIDDLE: a flexible data-driven preprocessing pipeline for structured clinical data
title_sort	democratizing ehr analyses with fiddle: a flexible data-driven preprocessing pipeline for structured clinical data
topic	Research and Applications
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7727385/ https://www.ncbi.nlm.nih.gov/pubmed/33040151 http://dx.doi.org/10.1093/jamia/ocaa139
work_keys_str_mv	AT tangshengpu democratizingehranalyseswithfiddleaflexibledatadrivenpreprocessingpipelineforstructuredclinicaldata AT davarmaneshparmida democratizingehranalyseswithfiddleaflexibledatadrivenpreprocessingpipelineforstructuredclinicaldata AT songyanmeng democratizingehranalyseswithfiddleaflexibledatadrivenpreprocessingpipelineforstructuredclinicaldata AT koutradanai democratizingehranalyseswithfiddleaflexibledatadrivenpreprocessingpipelineforstructuredclinicaldata AT sjodingmichaelw democratizingehranalyseswithfiddleaflexibledatadrivenpreprocessingpipelineforstructuredclinicaldata AT wiensjenna democratizingehranalyseswithfiddleaflexibledatadrivenpreprocessingpipelineforstructuredclinicaldata

Democratizing EHR analyses with FIDDLE: a flexible data-driven preprocessing pipeline for structured clinical data

Ejemplares similares