Cargando…

Anomaly Detection in COVID-19 Time-Series Data

Anomaly detection and explanation in big volumes of real-world medical data, such as those pertaining to COVID-19, pose some challenges. First, we are dealing with time-series data. Typical time-series data describe behavior of a single object over time. In medical data, we are dealing with time-ser...

Descripción completa

Detalles Bibliográficos
Autores principales:	Homayouni, Hajar, Ray, Indrakshi, Ghosh, Sudipto, Gondalia, Shlok, Kahn, Michael G.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer Singapore 2021
Materias:	Original Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8132285/ https://www.ncbi.nlm.nih.gov/pubmed/34027432 http://dx.doi.org/10.1007/s42979-021-00658-w

_version_	1783694887332347904
author	Homayouni, Hajar Ray, Indrakshi Ghosh, Sudipto Gondalia, Shlok Kahn, Michael G.
author_facet	Homayouni, Hajar Ray, Indrakshi Ghosh, Sudipto Gondalia, Shlok Kahn, Michael G.
author_sort	Homayouni, Hajar
collection	PubMed
description	Anomaly detection and explanation in big volumes of real-world medical data, such as those pertaining to COVID-19, pose some challenges. First, we are dealing with time-series data. Typical time-series data describe behavior of a single object over time. In medical data, we are dealing with time-series data belonging to multiple entities. Thus, there may be multiple subsets of records such that records in each subset, which belong to a single entity are temporally dependent, but the records in different subsets are unrelated. Moreover, the records in a subset contain different types of attributes, some of which must be grouped in a particular manner to make the analysis meaningful. Anomaly detection techniques need to be customized for time-series data belonging to multiple entities. Second, anomaly detection techniques fail to explain the cause of outliers to the experts. This is critical for new diseases and pandemics where current knowledge is insufficient. We propose to address these issues by extending our existing work called IDEAL, which is an LSTM-autoencoder based approach for data quality testing of sequential records, and provides explanations of constraint violations in a manner that is understandable to end-users. The extension (1) uses a novel two-level reshaping technique that splits COVID-19 data sets into multiple temporally-dependent subsequences and (2) adds a data visualization plot to further explain the anomalies and evaluate the level of abnormality of subsequences detected by IDEAL. We performed two systematic evaluation studies for our anomalous subsequence detection. One study uses aggregate data, including the number of cases, deaths, recovered, and percentage of hospitalization rate, collected from a COVID tracking project, New York Times, and Johns Hopkins for the same time period. The other study uses COVID-19 patient medical records obtained from Anschutz Medical Center health data warehouse. The results are promising and indicate that our techniques can be used to detect anomalies in large volumes of real-world unlabeled data whose accuracy or validity is unknown.
format	Online Article Text
id	pubmed-8132285
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Springer Singapore
record_format	MEDLINE/PubMed
spelling	pubmed-81322852021-05-19 Anomaly Detection in COVID-19 Time-Series Data Homayouni, Hajar Ray, Indrakshi Ghosh, Sudipto Gondalia, Shlok Kahn, Michael G. SN Comput Sci Original Research Anomaly detection and explanation in big volumes of real-world medical data, such as those pertaining to COVID-19, pose some challenges. First, we are dealing with time-series data. Typical time-series data describe behavior of a single object over time. In medical data, we are dealing with time-series data belonging to multiple entities. Thus, there may be multiple subsets of records such that records in each subset, which belong to a single entity are temporally dependent, but the records in different subsets are unrelated. Moreover, the records in a subset contain different types of attributes, some of which must be grouped in a particular manner to make the analysis meaningful. Anomaly detection techniques need to be customized for time-series data belonging to multiple entities. Second, anomaly detection techniques fail to explain the cause of outliers to the experts. This is critical for new diseases and pandemics where current knowledge is insufficient. We propose to address these issues by extending our existing work called IDEAL, which is an LSTM-autoencoder based approach for data quality testing of sequential records, and provides explanations of constraint violations in a manner that is understandable to end-users. The extension (1) uses a novel two-level reshaping technique that splits COVID-19 data sets into multiple temporally-dependent subsequences and (2) adds a data visualization plot to further explain the anomalies and evaluate the level of abnormality of subsequences detected by IDEAL. We performed two systematic evaluation studies for our anomalous subsequence detection. One study uses aggregate data, including the number of cases, deaths, recovered, and percentage of hospitalization rate, collected from a COVID tracking project, New York Times, and Johns Hopkins for the same time period. The other study uses COVID-19 patient medical records obtained from Anschutz Medical Center health data warehouse. The results are promising and indicate that our techniques can be used to detect anomalies in large volumes of real-world unlabeled data whose accuracy or validity is unknown. Springer Singapore 2021-05-19 2021 /pmc/articles/PMC8132285/ /pubmed/34027432 http://dx.doi.org/10.1007/s42979-021-00658-w Text en © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2021 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Original Research Homayouni, Hajar Ray, Indrakshi Ghosh, Sudipto Gondalia, Shlok Kahn, Michael G. Anomaly Detection in COVID-19 Time-Series Data
title	Anomaly Detection in COVID-19 Time-Series Data
title_full	Anomaly Detection in COVID-19 Time-Series Data
title_fullStr	Anomaly Detection in COVID-19 Time-Series Data
title_full_unstemmed	Anomaly Detection in COVID-19 Time-Series Data
title_short	Anomaly Detection in COVID-19 Time-Series Data
title_sort	anomaly detection in covid-19 time-series data
topic	Original Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8132285/ https://www.ncbi.nlm.nih.gov/pubmed/34027432 http://dx.doi.org/10.1007/s42979-021-00658-w
work_keys_str_mv	AT homayounihajar anomalydetectionincovid19timeseriesdata AT rayindrakshi anomalydetectionincovid19timeseriesdata AT ghoshsudipto anomalydetectionincovid19timeseriesdata AT gondaliashlok anomalydetectionincovid19timeseriesdata AT kahnmichaelg anomalydetectionincovid19timeseriesdata

Anomaly Detection in COVID-19 Time-Series Data

Ejemplares similares