Cargando…

RoBERTa-Assisted Outcome Prediction in Ovarian Cancer Cytoreductive Surgery Using Operative Notes

INTRODUCTION: Contemporary efforts to predict surgical outcomes focus on the associations between traditional discrete surgical risk factors. We aimed to determine whether natural language processing (NLP) of unstructured operative notes improves the prediction of residual disease in women with adva...

Descripción completa

Detalles Bibliográficos
Autores principales: Laios, Alexandros, Kalampokis, Evangelos, Mamalis, Marios Evangelos, Tarabanis, Constantine, Nugent, David, Thangavelu, Amudha, Theophilou, Georgios, De Jong, Diederick
Formato: Online Artículo Texto
Lenguaje:English
Publicado: SAGE Publications 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10624075/
https://www.ncbi.nlm.nih.gov/pubmed/37915208
http://dx.doi.org/10.1177/10732748231209892
_version_ 1785130855486717952
author Laios, Alexandros
Kalampokis, Evangelos
Mamalis, Marios Evangelos
Tarabanis, Constantine
Nugent, David
Thangavelu, Amudha
Theophilou, Georgios
De Jong, Diederick
author_facet Laios, Alexandros
Kalampokis, Evangelos
Mamalis, Marios Evangelos
Tarabanis, Constantine
Nugent, David
Thangavelu, Amudha
Theophilou, Georgios
De Jong, Diederick
author_sort Laios, Alexandros
collection PubMed
description INTRODUCTION: Contemporary efforts to predict surgical outcomes focus on the associations between traditional discrete surgical risk factors. We aimed to determine whether natural language processing (NLP) of unstructured operative notes improves the prediction of residual disease in women with advanced epithelial ovarian cancer (EOC) following cytoreductive surgery. METHODS: Electronic Health Records were queried to identify women with advanced EOC including their operative notes. The Term Frequency – Inverse Document Frequency (TF-IDF) score was used to quantify the discrimination capacity of sequences of words (n-grams) regarding the existence of residual disease. We employed the state-of-the-art RoBERTa-based classifier to process unstructured surgical notes. Discrimination was measured using standard performance metrics. An XGBoost model was then trained on the same dataset using both discrete and engineered clinical features along with the probabilities outputted by the RoBERTa classifier. RESULTS: The cohort consisted of 555 cases of EOC cytoreduction performed by eight surgeons between January 2014 and December 2019. Discrete word clouds weighted by n-gram TF-IDF score difference between R0 and non-R0 resection were identified. The words ‘adherent’ and ‘miliary disease’ best discriminated between the two groups. The RoBERTa model reached high evaluation metrics (AUROC .86; AUPRC .87, precision, recall, and F1 score of .77 and accuracy of .81). Equally, it outperformed models that used discrete clinical and engineered features and outplayed the performance of other state-of-the-art NLP tools. When the probabilities from the RoBERTa classifier were combined with commonly used predictors in the XGBoost model, a marginal improvement in the overall model’s performance was observed (AUROC and AUPRC of .91, with all other metrics the same). CONCLUSION/IMPLICATIONS: We applied a sui generis approach to extract information from the abundant textual surgical data and demonstrated how it can be effectively used for classification prediction, outperforming models relying on conventional structured data. State-of-art NLP applications in biomedical texts can improve modern EOC care.
format Online
Article
Text
id pubmed-10624075
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher SAGE Publications
record_format MEDLINE/PubMed
spelling pubmed-106240752023-11-04 RoBERTa-Assisted Outcome Prediction in Ovarian Cancer Cytoreductive Surgery Using Operative Notes Laios, Alexandros Kalampokis, Evangelos Mamalis, Marios Evangelos Tarabanis, Constantine Nugent, David Thangavelu, Amudha Theophilou, Georgios De Jong, Diederick Cancer Control An Inventory of Epithelial Ovarian Cancer Targets: “Evidence-Based” Options INTRODUCTION: Contemporary efforts to predict surgical outcomes focus on the associations between traditional discrete surgical risk factors. We aimed to determine whether natural language processing (NLP) of unstructured operative notes improves the prediction of residual disease in women with advanced epithelial ovarian cancer (EOC) following cytoreductive surgery. METHODS: Electronic Health Records were queried to identify women with advanced EOC including their operative notes. The Term Frequency – Inverse Document Frequency (TF-IDF) score was used to quantify the discrimination capacity of sequences of words (n-grams) regarding the existence of residual disease. We employed the state-of-the-art RoBERTa-based classifier to process unstructured surgical notes. Discrimination was measured using standard performance metrics. An XGBoost model was then trained on the same dataset using both discrete and engineered clinical features along with the probabilities outputted by the RoBERTa classifier. RESULTS: The cohort consisted of 555 cases of EOC cytoreduction performed by eight surgeons between January 2014 and December 2019. Discrete word clouds weighted by n-gram TF-IDF score difference between R0 and non-R0 resection were identified. The words ‘adherent’ and ‘miliary disease’ best discriminated between the two groups. The RoBERTa model reached high evaluation metrics (AUROC .86; AUPRC .87, precision, recall, and F1 score of .77 and accuracy of .81). Equally, it outperformed models that used discrete clinical and engineered features and outplayed the performance of other state-of-the-art NLP tools. When the probabilities from the RoBERTa classifier were combined with commonly used predictors in the XGBoost model, a marginal improvement in the overall model’s performance was observed (AUROC and AUPRC of .91, with all other metrics the same). CONCLUSION/IMPLICATIONS: We applied a sui generis approach to extract information from the abundant textual surgical data and demonstrated how it can be effectively used for classification prediction, outperforming models relying on conventional structured data. State-of-art NLP applications in biomedical texts can improve modern EOC care. SAGE Publications 2023-11-01 /pmc/articles/PMC10624075/ /pubmed/37915208 http://dx.doi.org/10.1177/10732748231209892 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by-nc/4.0/This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (https://creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle An Inventory of Epithelial Ovarian Cancer Targets: “Evidence-Based” Options
Laios, Alexandros
Kalampokis, Evangelos
Mamalis, Marios Evangelos
Tarabanis, Constantine
Nugent, David
Thangavelu, Amudha
Theophilou, Georgios
De Jong, Diederick
RoBERTa-Assisted Outcome Prediction in Ovarian Cancer Cytoreductive Surgery Using Operative Notes
title RoBERTa-Assisted Outcome Prediction in Ovarian Cancer Cytoreductive Surgery Using Operative Notes
title_full RoBERTa-Assisted Outcome Prediction in Ovarian Cancer Cytoreductive Surgery Using Operative Notes
title_fullStr RoBERTa-Assisted Outcome Prediction in Ovarian Cancer Cytoreductive Surgery Using Operative Notes
title_full_unstemmed RoBERTa-Assisted Outcome Prediction in Ovarian Cancer Cytoreductive Surgery Using Operative Notes
title_short RoBERTa-Assisted Outcome Prediction in Ovarian Cancer Cytoreductive Surgery Using Operative Notes
title_sort roberta-assisted outcome prediction in ovarian cancer cytoreductive surgery using operative notes
topic An Inventory of Epithelial Ovarian Cancer Targets: “Evidence-Based” Options
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10624075/
https://www.ncbi.nlm.nih.gov/pubmed/37915208
http://dx.doi.org/10.1177/10732748231209892
work_keys_str_mv AT laiosalexandros robertaassistedoutcomepredictioninovariancancercytoreductivesurgeryusingoperativenotes
AT kalampokisevangelos robertaassistedoutcomepredictioninovariancancercytoreductivesurgeryusingoperativenotes
AT mamalismariosevangelos robertaassistedoutcomepredictioninovariancancercytoreductivesurgeryusingoperativenotes
AT tarabanisconstantine robertaassistedoutcomepredictioninovariancancercytoreductivesurgeryusingoperativenotes
AT nugentdavid robertaassistedoutcomepredictioninovariancancercytoreductivesurgeryusingoperativenotes
AT thangaveluamudha robertaassistedoutcomepredictioninovariancancercytoreductivesurgeryusingoperativenotes
AT theophilougeorgios robertaassistedoutcomepredictioninovariancancercytoreductivesurgeryusingoperativenotes
AT dejongdiederick robertaassistedoutcomepredictioninovariancancercytoreductivesurgeryusingoperativenotes