Cargando…

ELII: A novel inverted index for fast temporal query, with application to a large Covid-19 EHR dataset

Fast temporal query on large EHR-derived data sources presents an emerging big data challenge, as this query modality is intractable using conventional strategies that have not focused on addressing Covid-19-related research needs at scale. We introduce a novel approach called Event-level Inverted I...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Yan, Li, Xiaojin, Zhang, Guo-Qiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Author(s). Published by Elsevier Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9759789/
https://www.ncbi.nlm.nih.gov/pubmed/33775815
http://dx.doi.org/10.1016/j.jbi.2021.103744
_version_ 1784852310614081536
author Huang, Yan
Li, Xiaojin
Zhang, Guo-Qiang
author_facet Huang, Yan
Li, Xiaojin
Zhang, Guo-Qiang
author_sort Huang, Yan
collection PubMed
description Fast temporal query on large EHR-derived data sources presents an emerging big data challenge, as this query modality is intractable using conventional strategies that have not focused on addressing Covid-19-related research needs at scale. We introduce a novel approach called Event-level Inverted Index (ELII) to optimize time trade-offs between one-time batch preprocessing and subsequent open-ended, user-specified temporal queries. An experimental temporal query engine has been implemented in a NoSQL database using our new ELII strategy. Near-real-time performance was achieved on a large Covid-19 EHR dataset, with 1.3 million unique patients and 3.76 billion records. We evaluated the performance of ELII on several types of queries: classical (non-temporal), absolute temporal, and relative temporal. Our experimental results indicate that ELII accomplished these queries in seconds, achieving average speed accelerations of 26.8 times on relative temporal query, 88.6 times on absolute temporal query, and 1037.6 times on classical query compared to a baseline approach without using ELII. Our study suggests that ELII is a promising approach supporting fast temporal query, an important mode of cohort development for Covid-19 studies.
format Online
Article
Text
id pubmed-9759789
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher The Author(s). Published by Elsevier Inc.
record_format MEDLINE/PubMed
spelling pubmed-97597892022-12-19 ELII: A novel inverted index for fast temporal query, with application to a large Covid-19 EHR dataset Huang, Yan Li, Xiaojin Zhang, Guo-Qiang J Biomed Inform Original Research Fast temporal query on large EHR-derived data sources presents an emerging big data challenge, as this query modality is intractable using conventional strategies that have not focused on addressing Covid-19-related research needs at scale. We introduce a novel approach called Event-level Inverted Index (ELII) to optimize time trade-offs between one-time batch preprocessing and subsequent open-ended, user-specified temporal queries. An experimental temporal query engine has been implemented in a NoSQL database using our new ELII strategy. Near-real-time performance was achieved on a large Covid-19 EHR dataset, with 1.3 million unique patients and 3.76 billion records. We evaluated the performance of ELII on several types of queries: classical (non-temporal), absolute temporal, and relative temporal. Our experimental results indicate that ELII accomplished these queries in seconds, achieving average speed accelerations of 26.8 times on relative temporal query, 88.6 times on absolute temporal query, and 1037.6 times on classical query compared to a baseline approach without using ELII. Our study suggests that ELII is a promising approach supporting fast temporal query, an important mode of cohort development for Covid-19 studies. The Author(s). Published by Elsevier Inc. 2021-05 2021-03-26 /pmc/articles/PMC9759789/ /pubmed/33775815 http://dx.doi.org/10.1016/j.jbi.2021.103744 Text en © 2021 The Author(s) Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Original Research
Huang, Yan
Li, Xiaojin
Zhang, Guo-Qiang
ELII: A novel inverted index for fast temporal query, with application to a large Covid-19 EHR dataset
title ELII: A novel inverted index for fast temporal query, with application to a large Covid-19 EHR dataset
title_full ELII: A novel inverted index for fast temporal query, with application to a large Covid-19 EHR dataset
title_fullStr ELII: A novel inverted index for fast temporal query, with application to a large Covid-19 EHR dataset
title_full_unstemmed ELII: A novel inverted index for fast temporal query, with application to a large Covid-19 EHR dataset
title_short ELII: A novel inverted index for fast temporal query, with application to a large Covid-19 EHR dataset
title_sort elii: a novel inverted index for fast temporal query, with application to a large covid-19 ehr dataset
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9759789/
https://www.ncbi.nlm.nih.gov/pubmed/33775815
http://dx.doi.org/10.1016/j.jbi.2021.103744
work_keys_str_mv AT huangyan eliianovelinvertedindexforfasttemporalquerywithapplicationtoalargecovid19ehrdataset
AT lixiaojin eliianovelinvertedindexforfasttemporalquerywithapplicationtoalargecovid19ehrdataset
AT zhangguoqiang eliianovelinvertedindexforfasttemporalquerywithapplicationtoalargecovid19ehrdataset