Cargando…
ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data
Electronic health records (EHRs) have been widely adopted in recent years, but often include a high proportion of missing data, which can create difficulties in implementing machine learning and other tools of personalized medicine. Completed datasets are preferred for a number of analysis methods,...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8283820/ https://www.ncbi.nlm.nih.gov/pubmed/34276792 http://dx.doi.org/10.3389/fgene.2021.691274 |
_version_ | 1783723277375504384 |
---|---|
author | Zhou, Yi-Hui Saghapour, Ehsan |
author_facet | Zhou, Yi-Hui Saghapour, Ehsan |
author_sort | Zhou, Yi-Hui |
collection | PubMed |
description | Electronic health records (EHRs) have been widely adopted in recent years, but often include a high proportion of missing data, which can create difficulties in implementing machine learning and other tools of personalized medicine. Completed datasets are preferred for a number of analysis methods, and successful imputation of missing EHR data can improve interpretation and increase our power to predict health outcomes. However, use of the most popular imputation methods mainly require scripting skills, and are implemented using various packages and syntax. Thus, the implementation of a full suite of methods is generally out of reach to all except experienced data scientists. Moreover, imputation is often considered as a separate exercise from exploratory data analysis, but should be considered as art of the data exploration process. We have created a new graphical tool, ImputEHR, that is based on a Python base and allows implementation of a range of simple and sophisticated (e.g., gradient-boosted tree-based and neural network) data imputation approaches. In addition to imputation, the tool enables data exploration for informed decision-making, as well as implementing machine learning prediction tools for response data selected by the user. Although the approach works for any missing data problem, the tool is primarily motivated by problems encountered for EHR and other biomedical data. We illustrate the tool using multiple real datasets, providing performance measures of imputation and downstream predictive analysis. |
format | Online Article Text |
id | pubmed-8283820 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-82838202021-07-17 ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data Zhou, Yi-Hui Saghapour, Ehsan Front Genet Genetics Electronic health records (EHRs) have been widely adopted in recent years, but often include a high proportion of missing data, which can create difficulties in implementing machine learning and other tools of personalized medicine. Completed datasets are preferred for a number of analysis methods, and successful imputation of missing EHR data can improve interpretation and increase our power to predict health outcomes. However, use of the most popular imputation methods mainly require scripting skills, and are implemented using various packages and syntax. Thus, the implementation of a full suite of methods is generally out of reach to all except experienced data scientists. Moreover, imputation is often considered as a separate exercise from exploratory data analysis, but should be considered as art of the data exploration process. We have created a new graphical tool, ImputEHR, that is based on a Python base and allows implementation of a range of simple and sophisticated (e.g., gradient-boosted tree-based and neural network) data imputation approaches. In addition to imputation, the tool enables data exploration for informed decision-making, as well as implementing machine learning prediction tools for response data selected by the user. Although the approach works for any missing data problem, the tool is primarily motivated by problems encountered for EHR and other biomedical data. We illustrate the tool using multiple real datasets, providing performance measures of imputation and downstream predictive analysis. Frontiers Media S.A. 2021-07-02 /pmc/articles/PMC8283820/ /pubmed/34276792 http://dx.doi.org/10.3389/fgene.2021.691274 Text en Copyright © 2021 Zhou and Saghapour. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Zhou, Yi-Hui Saghapour, Ehsan ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data |
title | ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data |
title_full | ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data |
title_fullStr | ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data |
title_full_unstemmed | ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data |
title_short | ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data |
title_sort | imputehr: a visualization tool of imputation for the prediction of biomedical data |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8283820/ https://www.ncbi.nlm.nih.gov/pubmed/34276792 http://dx.doi.org/10.3389/fgene.2021.691274 |
work_keys_str_mv | AT zhouyihui imputehravisualizationtoolofimputationforthepredictionofbiomedicaldata AT saghapourehsan imputehravisualizationtoolofimputationforthepredictionofbiomedicaldata |