Cargando…
ELII: A novel inverted index for fast temporal query, with application to a large Covid-19 EHR dataset
Fast temporal query on large EHR-derived data sources presents an emerging big data challenge, as this query modality is intractable using conventional strategies that have not focused on addressing Covid-19-related research needs at scale. We introduce a novel approach called Event-level Inverted I...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
The Author(s). Published by Elsevier Inc.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9759789/ https://www.ncbi.nlm.nih.gov/pubmed/33775815 http://dx.doi.org/10.1016/j.jbi.2021.103744 |
_version_ | 1784852310614081536 |
---|---|
author | Huang, Yan Li, Xiaojin Zhang, Guo-Qiang |
author_facet | Huang, Yan Li, Xiaojin Zhang, Guo-Qiang |
author_sort | Huang, Yan |
collection | PubMed |
description | Fast temporal query on large EHR-derived data sources presents an emerging big data challenge, as this query modality is intractable using conventional strategies that have not focused on addressing Covid-19-related research needs at scale. We introduce a novel approach called Event-level Inverted Index (ELII) to optimize time trade-offs between one-time batch preprocessing and subsequent open-ended, user-specified temporal queries. An experimental temporal query engine has been implemented in a NoSQL database using our new ELII strategy. Near-real-time performance was achieved on a large Covid-19 EHR dataset, with 1.3 million unique patients and 3.76 billion records. We evaluated the performance of ELII on several types of queries: classical (non-temporal), absolute temporal, and relative temporal. Our experimental results indicate that ELII accomplished these queries in seconds, achieving average speed accelerations of 26.8 times on relative temporal query, 88.6 times on absolute temporal query, and 1037.6 times on classical query compared to a baseline approach without using ELII. Our study suggests that ELII is a promising approach supporting fast temporal query, an important mode of cohort development for Covid-19 studies. |
format | Online Article Text |
id | pubmed-9759789 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | The Author(s). Published by Elsevier Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-97597892022-12-19 ELII: A novel inverted index for fast temporal query, with application to a large Covid-19 EHR dataset Huang, Yan Li, Xiaojin Zhang, Guo-Qiang J Biomed Inform Original Research Fast temporal query on large EHR-derived data sources presents an emerging big data challenge, as this query modality is intractable using conventional strategies that have not focused on addressing Covid-19-related research needs at scale. We introduce a novel approach called Event-level Inverted Index (ELII) to optimize time trade-offs between one-time batch preprocessing and subsequent open-ended, user-specified temporal queries. An experimental temporal query engine has been implemented in a NoSQL database using our new ELII strategy. Near-real-time performance was achieved on a large Covid-19 EHR dataset, with 1.3 million unique patients and 3.76 billion records. We evaluated the performance of ELII on several types of queries: classical (non-temporal), absolute temporal, and relative temporal. Our experimental results indicate that ELII accomplished these queries in seconds, achieving average speed accelerations of 26.8 times on relative temporal query, 88.6 times on absolute temporal query, and 1037.6 times on classical query compared to a baseline approach without using ELII. Our study suggests that ELII is a promising approach supporting fast temporal query, an important mode of cohort development for Covid-19 studies. The Author(s). Published by Elsevier Inc. 2021-05 2021-03-26 /pmc/articles/PMC9759789/ /pubmed/33775815 http://dx.doi.org/10.1016/j.jbi.2021.103744 Text en © 2021 The Author(s) Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active. |
spellingShingle | Original Research Huang, Yan Li, Xiaojin Zhang, Guo-Qiang ELII: A novel inverted index for fast temporal query, with application to a large Covid-19 EHR dataset |
title | ELII: A novel inverted index for fast temporal query, with application to a large Covid-19 EHR dataset |
title_full | ELII: A novel inverted index for fast temporal query, with application to a large Covid-19 EHR dataset |
title_fullStr | ELII: A novel inverted index for fast temporal query, with application to a large Covid-19 EHR dataset |
title_full_unstemmed | ELII: A novel inverted index for fast temporal query, with application to a large Covid-19 EHR dataset |
title_short | ELII: A novel inverted index for fast temporal query, with application to a large Covid-19 EHR dataset |
title_sort | elii: a novel inverted index for fast temporal query, with application to a large covid-19 ehr dataset |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9759789/ https://www.ncbi.nlm.nih.gov/pubmed/33775815 http://dx.doi.org/10.1016/j.jbi.2021.103744 |
work_keys_str_mv | AT huangyan eliianovelinvertedindexforfasttemporalquerywithapplicationtoalargecovid19ehrdataset AT lixiaojin eliianovelinvertedindexforfasttemporalquerywithapplicationtoalargecovid19ehrdataset AT zhangguoqiang eliianovelinvertedindexforfasttemporalquerywithapplicationtoalargecovid19ehrdataset |