Cargando…

Evaluating the Impact of Data Representation on EHR-Based Analytic Tasks

Different analytic techniques operate optimally with different types of data. As the use of EHR-based analytics expands to newer tasks, data will have to be transformed into different representations, so the tasks can be optimally solved. We classified representations into broad categories based on...

Descripción completa

Detalles Bibliográficos
Autores principales: Oh, Wonsuk, Steinbach, Michael S., Castro, M. Regina, Peterson, Kevin A., Kumar, Vipin, Caraballo, Pedro J., Simon, Gyorgy J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7666864/
https://www.ncbi.nlm.nih.gov/pubmed/31437931
http://dx.doi.org/10.3233/SHTI190229
_version_ 1783610216461369344
author Oh, Wonsuk
Steinbach, Michael S.
Castro, M. Regina
Peterson, Kevin A.
Kumar, Vipin
Caraballo, Pedro J.
Simon, Gyorgy J.
author_facet Oh, Wonsuk
Steinbach, Michael S.
Castro, M. Regina
Peterson, Kevin A.
Kumar, Vipin
Caraballo, Pedro J.
Simon, Gyorgy J.
author_sort Oh, Wonsuk
collection PubMed
description Different analytic techniques operate optimally with different types of data. As the use of EHR-based analytics expands to newer tasks, data will have to be transformed into different representations, so the tasks can be optimally solved. We classified representations into broad categories based on their characteristics, and proposed a new knowledge-driven representation for clinical data mining as well as trajectory mining, called Severity Encoding Variables (SEVs). Additionally, we studied which characteristics make representations most suitable for particular clinical analytics tasks including trajectory mining. Our evaluation shows that, for regression, most data representations performed similarly, with SEV achieving a slight (albeit statistically significant) advantage. For patients at high risk of diabetes, it outperformed the competing representation by (relative) 20%. For association mining, SEV achieved the highest performance. Its ability to constrain the search space of patterns through clinical knowledge was key to its success.
format Online
Article
Text
id pubmed-7666864
institution National Center for Biotechnology Information
language English
publishDate 2019
record_format MEDLINE/PubMed
spelling pubmed-76668642020-11-15 Evaluating the Impact of Data Representation on EHR-Based Analytic Tasks Oh, Wonsuk Steinbach, Michael S. Castro, M. Regina Peterson, Kevin A. Kumar, Vipin Caraballo, Pedro J. Simon, Gyorgy J. Stud Health Technol Inform Article Different analytic techniques operate optimally with different types of data. As the use of EHR-based analytics expands to newer tasks, data will have to be transformed into different representations, so the tasks can be optimally solved. We classified representations into broad categories based on their characteristics, and proposed a new knowledge-driven representation for clinical data mining as well as trajectory mining, called Severity Encoding Variables (SEVs). Additionally, we studied which characteristics make representations most suitable for particular clinical analytics tasks including trajectory mining. Our evaluation shows that, for regression, most data representations performed similarly, with SEV achieving a slight (albeit statistically significant) advantage. For patients at high risk of diabetes, it outperformed the competing representation by (relative) 20%. For association mining, SEV achieved the highest performance. Its ability to constrain the search space of patterns through clinical knowledge was key to its success. 2019-08-21 /pmc/articles/PMC7666864/ /pubmed/31437931 http://dx.doi.org/10.3233/SHTI190229 Text en This article is published online with Open Access by IOS Press and distributed under the terms of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0) (https://creativecommons.org/licenses/by-nc/4.0/) .
spellingShingle Article
Oh, Wonsuk
Steinbach, Michael S.
Castro, M. Regina
Peterson, Kevin A.
Kumar, Vipin
Caraballo, Pedro J.
Simon, Gyorgy J.
Evaluating the Impact of Data Representation on EHR-Based Analytic Tasks
title Evaluating the Impact of Data Representation on EHR-Based Analytic Tasks
title_full Evaluating the Impact of Data Representation on EHR-Based Analytic Tasks
title_fullStr Evaluating the Impact of Data Representation on EHR-Based Analytic Tasks
title_full_unstemmed Evaluating the Impact of Data Representation on EHR-Based Analytic Tasks
title_short Evaluating the Impact of Data Representation on EHR-Based Analytic Tasks
title_sort evaluating the impact of data representation on ehr-based analytic tasks
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7666864/
https://www.ncbi.nlm.nih.gov/pubmed/31437931
http://dx.doi.org/10.3233/SHTI190229
work_keys_str_mv AT ohwonsuk evaluatingtheimpactofdatarepresentationonehrbasedanalytictasks
AT steinbachmichaels evaluatingtheimpactofdatarepresentationonehrbasedanalytictasks
AT castromregina evaluatingtheimpactofdatarepresentationonehrbasedanalytictasks
AT petersonkevina evaluatingtheimpactofdatarepresentationonehrbasedanalytictasks
AT kumarvipin evaluatingtheimpactofdatarepresentationonehrbasedanalytictasks
AT caraballopedroj evaluatingtheimpactofdatarepresentationonehrbasedanalytictasks
AT simongyorgyj evaluatingtheimpactofdatarepresentationonehrbasedanalytictasks