Cargando…

Anomaly Detection in COVID-19 Time-Series Data

Anomaly detection and explanation in big volumes of real-world medical data, such as those pertaining to COVID-19, pose some challenges. First, we are dealing with time-series data. Typical time-series data describe behavior of a single object over time. In medical data, we are dealing with time-ser...

Descripción completa

Detalles Bibliográficos
Autores principales: Homayouni, Hajar, Ray, Indrakshi, Ghosh, Sudipto, Gondalia, Shlok, Kahn, Michael G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Singapore 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8132285/
https://www.ncbi.nlm.nih.gov/pubmed/34027432
http://dx.doi.org/10.1007/s42979-021-00658-w
_version_ 1783694887332347904
author Homayouni, Hajar
Ray, Indrakshi
Ghosh, Sudipto
Gondalia, Shlok
Kahn, Michael G.
author_facet Homayouni, Hajar
Ray, Indrakshi
Ghosh, Sudipto
Gondalia, Shlok
Kahn, Michael G.
author_sort Homayouni, Hajar
collection PubMed
description Anomaly detection and explanation in big volumes of real-world medical data, such as those pertaining to COVID-19, pose some challenges. First, we are dealing with time-series data. Typical time-series data describe behavior of a single object over time. In medical data, we are dealing with time-series data belonging to multiple entities. Thus, there may be multiple subsets of records such that records in each subset, which belong to a single entity are temporally dependent, but the records in different subsets are unrelated. Moreover, the records in a subset contain different types of attributes, some of which must be grouped in a particular manner to make the analysis meaningful. Anomaly detection techniques need to be customized for time-series data belonging to multiple entities. Second, anomaly detection techniques fail to explain the cause of outliers to the experts. This is critical for new diseases and pandemics where current knowledge is insufficient. We propose to address these issues by extending our existing work called IDEAL, which is an LSTM-autoencoder based approach for data quality testing of sequential records, and provides explanations of constraint violations in a manner that is understandable to end-users. The extension (1) uses a novel two-level reshaping technique that splits COVID-19 data sets into multiple temporally-dependent subsequences and (2) adds a data visualization plot to further explain the anomalies and evaluate the level of abnormality of subsequences detected by IDEAL. We performed two systematic evaluation studies for our anomalous subsequence detection. One study uses aggregate data, including the number of cases, deaths, recovered, and percentage of hospitalization rate, collected from a COVID tracking project, New York Times, and Johns Hopkins for the same time period. The other study uses COVID-19 patient medical records obtained from Anschutz Medical Center health data warehouse. The results are promising and indicate that our techniques can be used to detect anomalies in large volumes of real-world unlabeled data whose accuracy or validity is unknown.
format Online
Article
Text
id pubmed-8132285
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Springer Singapore
record_format MEDLINE/PubMed
spelling pubmed-81322852021-05-19 Anomaly Detection in COVID-19 Time-Series Data Homayouni, Hajar Ray, Indrakshi Ghosh, Sudipto Gondalia, Shlok Kahn, Michael G. SN Comput Sci Original Research Anomaly detection and explanation in big volumes of real-world medical data, such as those pertaining to COVID-19, pose some challenges. First, we are dealing with time-series data. Typical time-series data describe behavior of a single object over time. In medical data, we are dealing with time-series data belonging to multiple entities. Thus, there may be multiple subsets of records such that records in each subset, which belong to a single entity are temporally dependent, but the records in different subsets are unrelated. Moreover, the records in a subset contain different types of attributes, some of which must be grouped in a particular manner to make the analysis meaningful. Anomaly detection techniques need to be customized for time-series data belonging to multiple entities. Second, anomaly detection techniques fail to explain the cause of outliers to the experts. This is critical for new diseases and pandemics where current knowledge is insufficient. We propose to address these issues by extending our existing work called IDEAL, which is an LSTM-autoencoder based approach for data quality testing of sequential records, and provides explanations of constraint violations in a manner that is understandable to end-users. The extension (1) uses a novel two-level reshaping technique that splits COVID-19 data sets into multiple temporally-dependent subsequences and (2) adds a data visualization plot to further explain the anomalies and evaluate the level of abnormality of subsequences detected by IDEAL. We performed two systematic evaluation studies for our anomalous subsequence detection. One study uses aggregate data, including the number of cases, deaths, recovered, and percentage of hospitalization rate, collected from a COVID tracking project, New York Times, and Johns Hopkins for the same time period. The other study uses COVID-19 patient medical records obtained from Anschutz Medical Center health data warehouse. The results are promising and indicate that our techniques can be used to detect anomalies in large volumes of real-world unlabeled data whose accuracy or validity is unknown. Springer Singapore 2021-05-19 2021 /pmc/articles/PMC8132285/ /pubmed/34027432 http://dx.doi.org/10.1007/s42979-021-00658-w Text en © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2021 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Original Research
Homayouni, Hajar
Ray, Indrakshi
Ghosh, Sudipto
Gondalia, Shlok
Kahn, Michael G.
Anomaly Detection in COVID-19 Time-Series Data
title Anomaly Detection in COVID-19 Time-Series Data
title_full Anomaly Detection in COVID-19 Time-Series Data
title_fullStr Anomaly Detection in COVID-19 Time-Series Data
title_full_unstemmed Anomaly Detection in COVID-19 Time-Series Data
title_short Anomaly Detection in COVID-19 Time-Series Data
title_sort anomaly detection in covid-19 time-series data
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8132285/
https://www.ncbi.nlm.nih.gov/pubmed/34027432
http://dx.doi.org/10.1007/s42979-021-00658-w
work_keys_str_mv AT homayounihajar anomalydetectionincovid19timeseriesdata
AT rayindrakshi anomalydetectionincovid19timeseriesdata
AT ghoshsudipto anomalydetectionincovid19timeseriesdata
AT gondaliashlok anomalydetectionincovid19timeseriesdata
AT kahnmichaelg anomalydetectionincovid19timeseriesdata