Cargando…
Using Information Entropy to Monitor Chief Complaint Characteristics and Quality
OBJECTIVE: We describe how entropy, a key information measure, can be used to monitor the characteristics of chief complaints in an operational surveillance system. INTRODUCTION: Health care processes consume increasing volumes of digital data. However, creating and leveraging high quality integrate...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
University of Illinois at Chicago Library
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3692809/ |
Sumario: | OBJECTIVE: We describe how entropy, a key information measure, can be used to monitor the characteristics of chief complaints in an operational surveillance system. INTRODUCTION: Health care processes consume increasing volumes of digital data. However, creating and leveraging high quality integrated health data is challenging because large-scale health data derives from systems where data is captured from varying workflows, yielding varying data quality, potentially limiting its utility for various uses, including population health. To ensure accurate results, it’s important to assess the data quality for the particular use. Examples of sub-optimal health data quality abound: accuracy varies for medication and diagnostic data in hospital discharge and claims data; electronic laboratory data used to identify notifiable public-health cases shows varying levels of completeness across data sources; data timeliness has been found to vary across different data sources. Given that there is clear increasing focus on large health data sources; there are known data quality issues that hinder the utility of such data; and there is a paucity of medical literature describing approaches for evaluating these issues across integrated health data sources, we hypothesize that novel methods for ongoing monitoring of data quality in rapidly growing large health data sets, including surveillance data, will improve the accuracy and overall utility of these data. METHODS: Our analysis used chief complaint data derived from the original real-time HL7 registration transactions for ED encounters over a 3-year study period between January 1, 2008 and December 30, 2010 from over 100 institutions participating in the Indiana Public Health Emergency Surveillance System (PHESS) [1]. We used the following syndrome categories based on various definitions: respiratory, influenza like illness, gastrointestinal, neurological, undifferentiated infection, skin, and lymphatic. We calculated entropy for chief complaint data [2]. Entropy measures uncertainty and characterizes the density of the information contained in a message, commonly measured in bits. We analyzed entropy stratified a) by syndrome category, b) by syndrome category and time, and c) by syndrome category, time, and source institution. RESULTS: Analysis of more than 7.4 million records revealed the following: First, overall information content varied by syndrome, with “neurological” showing greatest entropy (Figure 1). Second, entropy measures followed consistent intraorganizational trends: information content varied less within an organization than across organizations (Figure 2). Third, information entropy enables detection of otherwise unannounced changes in system behavior. Figure 3 illustrates the monthly entropy measures for the respiratory syndrome from 5 sources over 36 months. One source changed registration software. Their visit volume didn’t change, but the information content of the chief complaint changed, as indicated by a substantial shift in entropy. CONCLUSIONS: As we face greater data volumes, methods assessing the data quality for particular uses, including syndrome surveillance, are needed. This analysis shows the value of entropy as a metric to support monitoring of surveillance systems. Future work will refine these measures and further assess the inter-organizational variations of entropy. |
---|