Cargando…

Evaluation of a Natural Language Processing Model to Identify and Characterize Patients in the United States With High-Risk Non–Muscle-Invasive Bladder Cancer

PURPOSE: Treatment of non–muscle-invasive bladder cancer (NMIBC) is guided by risk stratification using clinical and pathologic criteria. This study aimed to develop a natural language processing (NLP) model for identifying patients with high-risk NMIBC retrospectively from unstructured electronic m...

Descripción completa

Detalles Bibliográficos
Autores principales: Narayan, Vikram M., Siolas, Despina, Meadows, Eric S., Turzhitsky, Vladimir, Sillah, Arthur, Imai, Kentaro, McMurry, Andrew J., Li, Haojie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Wolters Kluwer Health 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10642898/
https://www.ncbi.nlm.nih.gov/pubmed/37906722
http://dx.doi.org/10.1200/CCI.23.00096
_version_ 1785147040788905984
author Narayan, Vikram M.
Siolas, Despina
Meadows, Eric S.
Turzhitsky, Vladimir
Sillah, Arthur
Imai, Kentaro
McMurry, Andrew J.
Li, Haojie
author_facet Narayan, Vikram M.
Siolas, Despina
Meadows, Eric S.
Turzhitsky, Vladimir
Sillah, Arthur
Imai, Kentaro
McMurry, Andrew J.
Li, Haojie
author_sort Narayan, Vikram M.
collection PubMed
description PURPOSE: Treatment of non–muscle-invasive bladder cancer (NMIBC) is guided by risk stratification using clinical and pathologic criteria. This study aimed to develop a natural language processing (NLP) model for identifying patients with high-risk NMIBC retrospectively from unstructured electronic medical records (EMRs) and to apply the model to describe patient and tumor characteristics. METHODS: We used three independent EMR-derived data sets including adult patients with a bladder cancer diagnosis in 2011-2020 for NLP model development and training (n = 140), validation (n = 697), and application for the retrospective cohort analysis (n = 4,402). Deep learning methods were used to train NLP recognition of medical chart terminology to identify seven high-risk NMIBC criteria; model performance was assessed using the F1 score, weighted across features. An algorithm was then used to classify each patient as high-risk NMIBC (yes/no). Manually reviewed records served as the gold standard. RESULTS: The F1 scores after model training were >0.7 for all but one uncommon feature (prostatic urethral involvement). The highest area under the receiver operating curves (AUC) was observed for Ta (0.897) and T1 (0.897); the lowest AUC was for carcinoma in situ (CIS; 0.617). For high-risk NMIBC classification, positive predictive value was 79.4%, negative predictive value was 93.2%, and false-positive rate was 8.9%. Sensitivity and specificity were 83.7% and 91.1%, respectively. Of 748 patients manually confirmed as having high-risk NMIBC, 196 (26%) had CIS (of whom 19% also had T1 and 23% also had Ta disease); 552 tumors (74%) had no associated CIS. CONCLUSION: The NLP model, combined with a rule-based algorithm, identified high-risk NMIBC with good performance and will enable future work to study real-world treatment patterns and clinical outcomes for high-risk NMIBC.
format Online
Article
Text
id pubmed-10642898
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Wolters Kluwer Health
record_format MEDLINE/PubMed
spelling pubmed-106428982023-11-14 Evaluation of a Natural Language Processing Model to Identify and Characterize Patients in the United States With High-Risk Non–Muscle-Invasive Bladder Cancer Narayan, Vikram M. Siolas, Despina Meadows, Eric S. Turzhitsky, Vladimir Sillah, Arthur Imai, Kentaro McMurry, Andrew J. Li, Haojie JCO Clin Cancer Inform ORIGINAL REPORTS PURPOSE: Treatment of non–muscle-invasive bladder cancer (NMIBC) is guided by risk stratification using clinical and pathologic criteria. This study aimed to develop a natural language processing (NLP) model for identifying patients with high-risk NMIBC retrospectively from unstructured electronic medical records (EMRs) and to apply the model to describe patient and tumor characteristics. METHODS: We used three independent EMR-derived data sets including adult patients with a bladder cancer diagnosis in 2011-2020 for NLP model development and training (n = 140), validation (n = 697), and application for the retrospective cohort analysis (n = 4,402). Deep learning methods were used to train NLP recognition of medical chart terminology to identify seven high-risk NMIBC criteria; model performance was assessed using the F1 score, weighted across features. An algorithm was then used to classify each patient as high-risk NMIBC (yes/no). Manually reviewed records served as the gold standard. RESULTS: The F1 scores after model training were >0.7 for all but one uncommon feature (prostatic urethral involvement). The highest area under the receiver operating curves (AUC) was observed for Ta (0.897) and T1 (0.897); the lowest AUC was for carcinoma in situ (CIS; 0.617). For high-risk NMIBC classification, positive predictive value was 79.4%, negative predictive value was 93.2%, and false-positive rate was 8.9%. Sensitivity and specificity were 83.7% and 91.1%, respectively. Of 748 patients manually confirmed as having high-risk NMIBC, 196 (26%) had CIS (of whom 19% also had T1 and 23% also had Ta disease); 552 tumors (74%) had no associated CIS. CONCLUSION: The NLP model, combined with a rule-based algorithm, identified high-risk NMIBC with good performance and will enable future work to study real-world treatment patterns and clinical outcomes for high-risk NMIBC. Wolters Kluwer Health 2023-10-31 /pmc/articles/PMC10642898/ /pubmed/37906722 http://dx.doi.org/10.1200/CCI.23.00096 Text en © 2023 by American Society of Clinical Oncology https://creativecommons.org/licenses/by-nc-nd/4.0/Creative Commons Attribution Non-Commercial No Derivatives 4.0 License: https://creativecommons.org/licenses/by-nc-nd/4.0/
spellingShingle ORIGINAL REPORTS
Narayan, Vikram M.
Siolas, Despina
Meadows, Eric S.
Turzhitsky, Vladimir
Sillah, Arthur
Imai, Kentaro
McMurry, Andrew J.
Li, Haojie
Evaluation of a Natural Language Processing Model to Identify and Characterize Patients in the United States With High-Risk Non–Muscle-Invasive Bladder Cancer
title Evaluation of a Natural Language Processing Model to Identify and Characterize Patients in the United States With High-Risk Non–Muscle-Invasive Bladder Cancer
title_full Evaluation of a Natural Language Processing Model to Identify and Characterize Patients in the United States With High-Risk Non–Muscle-Invasive Bladder Cancer
title_fullStr Evaluation of a Natural Language Processing Model to Identify and Characterize Patients in the United States With High-Risk Non–Muscle-Invasive Bladder Cancer
title_full_unstemmed Evaluation of a Natural Language Processing Model to Identify and Characterize Patients in the United States With High-Risk Non–Muscle-Invasive Bladder Cancer
title_short Evaluation of a Natural Language Processing Model to Identify and Characterize Patients in the United States With High-Risk Non–Muscle-Invasive Bladder Cancer
title_sort evaluation of a natural language processing model to identify and characterize patients in the united states with high-risk non–muscle-invasive bladder cancer
topic ORIGINAL REPORTS
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10642898/
https://www.ncbi.nlm.nih.gov/pubmed/37906722
http://dx.doi.org/10.1200/CCI.23.00096
work_keys_str_mv AT narayanvikramm evaluationofanaturallanguageprocessingmodeltoidentifyandcharacterizepatientsintheunitedstateswithhighrisknonmuscleinvasivebladdercancer
AT siolasdespina evaluationofanaturallanguageprocessingmodeltoidentifyandcharacterizepatientsintheunitedstateswithhighrisknonmuscleinvasivebladdercancer
AT meadowserics evaluationofanaturallanguageprocessingmodeltoidentifyandcharacterizepatientsintheunitedstateswithhighrisknonmuscleinvasivebladdercancer
AT turzhitskyvladimir evaluationofanaturallanguageprocessingmodeltoidentifyandcharacterizepatientsintheunitedstateswithhighrisknonmuscleinvasivebladdercancer
AT sillaharthur evaluationofanaturallanguageprocessingmodeltoidentifyandcharacterizepatientsintheunitedstateswithhighrisknonmuscleinvasivebladdercancer
AT imaikentaro evaluationofanaturallanguageprocessingmodeltoidentifyandcharacterizepatientsintheunitedstateswithhighrisknonmuscleinvasivebladdercancer
AT mcmurryandrewj evaluationofanaturallanguageprocessingmodeltoidentifyandcharacterizepatientsintheunitedstateswithhighrisknonmuscleinvasivebladdercancer
AT lihaojie evaluationofanaturallanguageprocessingmodeltoidentifyandcharacterizepatientsintheunitedstateswithhighrisknonmuscleinvasivebladdercancer