Cargando…

ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data

Electronic health records (EHRs) have been widely adopted in recent years, but often include a high proportion of missing data, which can create difficulties in implementing machine learning and other tools of personalized medicine. Completed datasets are preferred for a number of analysis methods,...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Yi-Hui, Saghapour, Ehsan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8283820/
https://www.ncbi.nlm.nih.gov/pubmed/34276792
http://dx.doi.org/10.3389/fgene.2021.691274
_version_ 1783723277375504384
author Zhou, Yi-Hui
Saghapour, Ehsan
author_facet Zhou, Yi-Hui
Saghapour, Ehsan
author_sort Zhou, Yi-Hui
collection PubMed
description Electronic health records (EHRs) have been widely adopted in recent years, but often include a high proportion of missing data, which can create difficulties in implementing machine learning and other tools of personalized medicine. Completed datasets are preferred for a number of analysis methods, and successful imputation of missing EHR data can improve interpretation and increase our power to predict health outcomes. However, use of the most popular imputation methods mainly require scripting skills, and are implemented using various packages and syntax. Thus, the implementation of a full suite of methods is generally out of reach to all except experienced data scientists. Moreover, imputation is often considered as a separate exercise from exploratory data analysis, but should be considered as art of the data exploration process. We have created a new graphical tool, ImputEHR, that is based on a Python base and allows implementation of a range of simple and sophisticated (e.g., gradient-boosted tree-based and neural network) data imputation approaches. In addition to imputation, the tool enables data exploration for informed decision-making, as well as implementing machine learning prediction tools for response data selected by the user. Although the approach works for any missing data problem, the tool is primarily motivated by problems encountered for EHR and other biomedical data. We illustrate the tool using multiple real datasets, providing performance measures of imputation and downstream predictive analysis.
format Online
Article
Text
id pubmed-8283820
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-82838202021-07-17 ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data Zhou, Yi-Hui Saghapour, Ehsan Front Genet Genetics Electronic health records (EHRs) have been widely adopted in recent years, but often include a high proportion of missing data, which can create difficulties in implementing machine learning and other tools of personalized medicine. Completed datasets are preferred for a number of analysis methods, and successful imputation of missing EHR data can improve interpretation and increase our power to predict health outcomes. However, use of the most popular imputation methods mainly require scripting skills, and are implemented using various packages and syntax. Thus, the implementation of a full suite of methods is generally out of reach to all except experienced data scientists. Moreover, imputation is often considered as a separate exercise from exploratory data analysis, but should be considered as art of the data exploration process. We have created a new graphical tool, ImputEHR, that is based on a Python base and allows implementation of a range of simple and sophisticated (e.g., gradient-boosted tree-based and neural network) data imputation approaches. In addition to imputation, the tool enables data exploration for informed decision-making, as well as implementing machine learning prediction tools for response data selected by the user. Although the approach works for any missing data problem, the tool is primarily motivated by problems encountered for EHR and other biomedical data. We illustrate the tool using multiple real datasets, providing performance measures of imputation and downstream predictive analysis. Frontiers Media S.A. 2021-07-02 /pmc/articles/PMC8283820/ /pubmed/34276792 http://dx.doi.org/10.3389/fgene.2021.691274 Text en Copyright © 2021 Zhou and Saghapour. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Zhou, Yi-Hui
Saghapour, Ehsan
ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data
title ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data
title_full ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data
title_fullStr ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data
title_full_unstemmed ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data
title_short ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data
title_sort imputehr: a visualization tool of imputation for the prediction of biomedical data
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8283820/
https://www.ncbi.nlm.nih.gov/pubmed/34276792
http://dx.doi.org/10.3389/fgene.2021.691274
work_keys_str_mv AT zhouyihui imputehravisualizationtoolofimputationforthepredictionofbiomedicaldata
AT saghapourehsan imputehravisualizationtoolofimputationforthepredictionofbiomedicaldata