Cargando…

A clustering approach for detecting implausible observation values in electronic health records data

BACKGROUND: Identifying implausible clinical observations (e.g., laboratory test and vital sign values) in Electronic Health Record (EHR) data using rule-based procedures is challenging. Anomaly/outlier detection methods can be applied as an alternative algorithmic approach to flagging such implausi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Estiri, Hossein, Klann, Jeffrey G., Murphy, Shawn N.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Technical Advance
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6652024/ https://www.ncbi.nlm.nih.gov/pubmed/31337390 http://dx.doi.org/10.1186/s12911-019-0852-6

_version_	1783438480956719104
author	Estiri, Hossein Klann, Jeffrey G. Murphy, Shawn N.
author_facet	Estiri, Hossein Klann, Jeffrey G. Murphy, Shawn N.
author_sort	Estiri, Hossein
collection	PubMed
description	BACKGROUND: Identifying implausible clinical observations (e.g., laboratory test and vital sign values) in Electronic Health Record (EHR) data using rule-based procedures is challenging. Anomaly/outlier detection methods can be applied as an alternative algorithmic approach to flagging such implausible values in EHRs. METHODS: The primary objectives of this research were to develop and test an unsupervised clustering-based anomaly/outlier detection approach for detecting implausible observations in EHR data as an alternative algorithmic solution to the existing procedures. Our approach is built upon two underlying hypotheses that, (i) when there are large number of observations, implausible records should be sparse, and therefore (ii) if these data are clustered properly, clusters with sparse populations should represent implausible observations. To test these hypotheses, we applied an unsupervised clustering algorithm to EHR observation data on 50 laboratory tests from Partners HealthCare. We tested different specifications of the clustering approach and computed confusion matrix indices against a set of silver-standard plausibility thresholds. We compared the results from the proposed approach with conventional anomaly detection (CAD) approaches, including standard deviation and Mahalanobis distance. RESULTS: We found that the clustering approach produced results with exceptional specificity and high sensitivity. Compared with the conventional anomaly detection approaches, our proposed clustering approach resulted in significantly smaller number of false positive cases. CONCLUSION: Our contributions include (i) a clustering approach for identifying implausible EHR observations, (ii) evidence that implausible observations are sparse in EHR laboratory test results, (iii) a parallel implementation of the clustering approach on i2b2 star schema, and (3) a set of silver-standard plausibility thresholds for 50 laboratory tests that can be used in other studies for validation. The proposed algorithmic solution can augment human decisions to improve data quality. Therefore, a workflow is needed to complement the algorithm’s job and initiate necessary actions that need to be taken in order to improve the quality of data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12911-019-0852-6) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-6652024
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-66520242019-07-31 A clustering approach for detecting implausible observation values in electronic health records data Estiri, Hossein Klann, Jeffrey G. Murphy, Shawn N. BMC Med Inform Decis Mak Technical Advance BACKGROUND: Identifying implausible clinical observations (e.g., laboratory test and vital sign values) in Electronic Health Record (EHR) data using rule-based procedures is challenging. Anomaly/outlier detection methods can be applied as an alternative algorithmic approach to flagging such implausible values in EHRs. METHODS: The primary objectives of this research were to develop and test an unsupervised clustering-based anomaly/outlier detection approach for detecting implausible observations in EHR data as an alternative algorithmic solution to the existing procedures. Our approach is built upon two underlying hypotheses that, (i) when there are large number of observations, implausible records should be sparse, and therefore (ii) if these data are clustered properly, clusters with sparse populations should represent implausible observations. To test these hypotheses, we applied an unsupervised clustering algorithm to EHR observation data on 50 laboratory tests from Partners HealthCare. We tested different specifications of the clustering approach and computed confusion matrix indices against a set of silver-standard plausibility thresholds. We compared the results from the proposed approach with conventional anomaly detection (CAD) approaches, including standard deviation and Mahalanobis distance. RESULTS: We found that the clustering approach produced results with exceptional specificity and high sensitivity. Compared with the conventional anomaly detection approaches, our proposed clustering approach resulted in significantly smaller number of false positive cases. CONCLUSION: Our contributions include (i) a clustering approach for identifying implausible EHR observations, (ii) evidence that implausible observations are sparse in EHR laboratory test results, (iii) a parallel implementation of the clustering approach on i2b2 star schema, and (3) a set of silver-standard plausibility thresholds for 50 laboratory tests that can be used in other studies for validation. The proposed algorithmic solution can augment human decisions to improve data quality. Therefore, a workflow is needed to complement the algorithm’s job and initiate necessary actions that need to be taken in order to improve the quality of data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12911-019-0852-6) contains supplementary material, which is available to authorized users. BioMed Central 2019-07-23 /pmc/articles/PMC6652024/ /pubmed/31337390 http://dx.doi.org/10.1186/s12911-019-0852-6 Text en © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Technical Advance Estiri, Hossein Klann, Jeffrey G. Murphy, Shawn N. A clustering approach for detecting implausible observation values in electronic health records data
title	A clustering approach for detecting implausible observation values in electronic health records data
title_full	A clustering approach for detecting implausible observation values in electronic health records data
title_fullStr	A clustering approach for detecting implausible observation values in electronic health records data
title_full_unstemmed	A clustering approach for detecting implausible observation values in electronic health records data
title_short	A clustering approach for detecting implausible observation values in electronic health records data
title_sort	clustering approach for detecting implausible observation values in electronic health records data
topic	Technical Advance
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6652024/ https://www.ncbi.nlm.nih.gov/pubmed/31337390 http://dx.doi.org/10.1186/s12911-019-0852-6
work_keys_str_mv	AT estirihossein aclusteringapproachfordetectingimplausibleobservationvaluesinelectronichealthrecordsdata AT klannjeffreyg aclusteringapproachfordetectingimplausibleobservationvaluesinelectronichealthrecordsdata AT murphyshawnn aclusteringapproachfordetectingimplausibleobservationvaluesinelectronichealthrecordsdata AT estirihossein clusteringapproachfordetectingimplausibleobservationvaluesinelectronichealthrecordsdata AT klannjeffreyg clusteringapproachfordetectingimplausibleobservationvaluesinelectronichealthrecordsdata AT murphyshawnn clusteringapproachfordetectingimplausibleobservationvaluesinelectronichealthrecordsdata

A clustering approach for detecting implausible observation values in electronic health records data

Ejemplares similares