Cargando…

Accelerated curation of checkpoint inhibitor-induced colitis cases from electronic health records

OBJECTIVE: Automatically identifying patients at risk of immune checkpoint inhibitor (ICI)-induced colitis allows physicians to improve patientcare. However, predictive models require training data curated from electronic health records (EHR). Our objective is to automatically identify notes documen...

Descripción completa

Detalles Bibliográficos
Autores principales: Rahman, Protiva, Ye, Cheng, Mittendorf, Kathleen F, Lenoue-Newton, Michele, Micheel, Christine, Wolber, Jan, Osterman, Travis, Fabbri, Daniel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10066800/
https://www.ncbi.nlm.nih.gov/pubmed/37012912
http://dx.doi.org/10.1093/jamiaopen/ooad017
Descripción
Sumario:OBJECTIVE: Automatically identifying patients at risk of immune checkpoint inhibitor (ICI)-induced colitis allows physicians to improve patientcare. However, predictive models require training data curated from electronic health records (EHR). Our objective is to automatically identify notes documenting ICI-colitis cases to accelerate data curation. MATERIALS AND METHODS: We present a data pipeline to automatically identify ICI-colitis from EHR notes, accelerating chart review. The pipeline relies on BERT, a state-of-the-art natural language processing (NLP) model. The first stage of the pipeline segments long notes using keywords identified through a logistic classifier and applies BERT to identify ICI-colitis notes. The next stage uses a second BERT model tuned to identify false positive notes and remove notes that were likely positive for mentioning colitis as a side-effect. The final stage further accelerates curation by highlighting the colitis-relevant portions of notes. Specifically, we use BERT’s attention scores to find high-density regions describing colitis. RESULTS: The overall pipeline identified colitis notes with 84% precision and reduced the curator note review load by 75%. The segment BERT classifier had a high recall of 0.98, which is crucial to identify the low incidence (<10%) of colitis. DISCUSSION: Curation from EHR notes is a burdensome task, especially when the curation topic is complicated. Methods described in this work are not only useful for ICI colitis but can also be adapted for other domains. CONCLUSION: Our extraction pipeline reduces manual note review load and makes EHR data more accessible for research.