Cargando…

Query bot for retrieving patients’ clinical history: A COVID-19 use-case

OBJECTIVE: With increasing patient complexity whose data are stored in fragmented health information systems, automated and time-efficient ways of gathering important information from the patients' medical history are needed for effective clinical decision making. Using COVID-19 as a case study...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Yibo, Tariq, Amara, Khan, Fiza, Gichoya, Judy Wawira, Trivedi, Hari, Banerjee, Imon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8454191/
https://www.ncbi.nlm.nih.gov/pubmed/34560275
http://dx.doi.org/10.1016/j.jbi.2021.103918
_version_ 1784570438431539200
author Wang, Yibo
Tariq, Amara
Khan, Fiza
Gichoya, Judy Wawira
Trivedi, Hari
Banerjee, Imon
author_facet Wang, Yibo
Tariq, Amara
Khan, Fiza
Gichoya, Judy Wawira
Trivedi, Hari
Banerjee, Imon
author_sort Wang, Yibo
collection PubMed
description OBJECTIVE: With increasing patient complexity whose data are stored in fragmented health information systems, automated and time-efficient ways of gathering important information from the patients' medical history are needed for effective clinical decision making. Using COVID-19 as a case study, we developed a query-bot information retrieval system with user-feedback to allow clinicians to ask natural questions to retrieve data from patient notes. MATERIALS AND METHODS: We applied clinicalBERT, a pre-trained contextual language model, to our dataset of patient notes to obtain sentence embeddings, using K-Means to reduce computation time for real-time interaction. Rocchio algorithm was then employed to incorporate user-feedback and improve retrieval performance. RESULTS: In an iterative feedback loop experiment, MAP for final iteration was 0.93/0.94 as compared to initial MAP of 0.66/0.52 for generic and 1./1. compared to 0.79/0.83 for COVID-19 specific queries confirming that contextual model handles the ambiguity in natural language queries and feedback helps to improve retrieval performance. User-in-loop experiment also outperformed the automated pseudo relevance feedback method. Moreover, the null hypothesis which assumes identical precision between initial retrieval and relevance feedback was rejected with high statistical significance (p ≪ 0.05). Compared to Word2Vec, TF-IDF and bioBERT models, clinicalBERT works optimally considering the balance between response precision and user-feedback. DISCUSSION: Our model works well for generic as well as COVID-19 specific queries. However, some generic queries are not answered as well as others because clustering reduces query performance and vague relations between queries and sentences are considered non-relevant. We also tested our model for queries with the same meaning but different expressions and demonstrated that these query variations yielded similar performance after incorporation of user-feedback. CONCLUSION: In conclusion, we develop an NLP-based query-bot that handles synonyms and natural language ambiguity in order to retrieve relevant information from the patient chart. User-feedback is critical to improve model performance.
format Online
Article
Text
id pubmed-8454191
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Elsevier Inc.
record_format MEDLINE/PubMed
spelling pubmed-84541912021-09-21 Query bot for retrieving patients’ clinical history: A COVID-19 use-case Wang, Yibo Tariq, Amara Khan, Fiza Gichoya, Judy Wawira Trivedi, Hari Banerjee, Imon J Biomed Inform Original Research OBJECTIVE: With increasing patient complexity whose data are stored in fragmented health information systems, automated and time-efficient ways of gathering important information from the patients' medical history are needed for effective clinical decision making. Using COVID-19 as a case study, we developed a query-bot information retrieval system with user-feedback to allow clinicians to ask natural questions to retrieve data from patient notes. MATERIALS AND METHODS: We applied clinicalBERT, a pre-trained contextual language model, to our dataset of patient notes to obtain sentence embeddings, using K-Means to reduce computation time for real-time interaction. Rocchio algorithm was then employed to incorporate user-feedback and improve retrieval performance. RESULTS: In an iterative feedback loop experiment, MAP for final iteration was 0.93/0.94 as compared to initial MAP of 0.66/0.52 for generic and 1./1. compared to 0.79/0.83 for COVID-19 specific queries confirming that contextual model handles the ambiguity in natural language queries and feedback helps to improve retrieval performance. User-in-loop experiment also outperformed the automated pseudo relevance feedback method. Moreover, the null hypothesis which assumes identical precision between initial retrieval and relevance feedback was rejected with high statistical significance (p ≪ 0.05). Compared to Word2Vec, TF-IDF and bioBERT models, clinicalBERT works optimally considering the balance between response precision and user-feedback. DISCUSSION: Our model works well for generic as well as COVID-19 specific queries. However, some generic queries are not answered as well as others because clustering reduces query performance and vague relations between queries and sentences are considered non-relevant. We also tested our model for queries with the same meaning but different expressions and demonstrated that these query variations yielded similar performance after incorporation of user-feedback. CONCLUSION: In conclusion, we develop an NLP-based query-bot that handles synonyms and natural language ambiguity in order to retrieve relevant information from the patient chart. User-feedback is critical to improve model performance. Elsevier Inc. 2021-11 2021-09-21 /pmc/articles/PMC8454191/ /pubmed/34560275 http://dx.doi.org/10.1016/j.jbi.2021.103918 Text en © 2021 Elsevier Inc. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Original Research
Wang, Yibo
Tariq, Amara
Khan, Fiza
Gichoya, Judy Wawira
Trivedi, Hari
Banerjee, Imon
Query bot for retrieving patients’ clinical history: A COVID-19 use-case
title Query bot for retrieving patients’ clinical history: A COVID-19 use-case
title_full Query bot for retrieving patients’ clinical history: A COVID-19 use-case
title_fullStr Query bot for retrieving patients’ clinical history: A COVID-19 use-case
title_full_unstemmed Query bot for retrieving patients’ clinical history: A COVID-19 use-case
title_short Query bot for retrieving patients’ clinical history: A COVID-19 use-case
title_sort query bot for retrieving patients’ clinical history: a covid-19 use-case
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8454191/
https://www.ncbi.nlm.nih.gov/pubmed/34560275
http://dx.doi.org/10.1016/j.jbi.2021.103918
work_keys_str_mv AT wangyibo querybotforretrievingpatientsclinicalhistoryacovid19usecase
AT tariqamara querybotforretrievingpatientsclinicalhistoryacovid19usecase
AT khanfiza querybotforretrievingpatientsclinicalhistoryacovid19usecase
AT gichoyajudywawira querybotforretrievingpatientsclinicalhistoryacovid19usecase
AT trivedihari querybotforretrievingpatientsclinicalhistoryacovid19usecase
AT banerjeeimon querybotforretrievingpatientsclinicalhistoryacovid19usecase