Cargando…
RoBERTa-Assisted Outcome Prediction in Ovarian Cancer Cytoreductive Surgery Using Operative Notes
INTRODUCTION: Contemporary efforts to predict surgical outcomes focus on the associations between traditional discrete surgical risk factors. We aimed to determine whether natural language processing (NLP) of unstructured operative notes improves the prediction of residual disease in women with adva...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
SAGE Publications
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10624075/ https://www.ncbi.nlm.nih.gov/pubmed/37915208 http://dx.doi.org/10.1177/10732748231209892 |
_version_ | 1785130855486717952 |
---|---|
author | Laios, Alexandros Kalampokis, Evangelos Mamalis, Marios Evangelos Tarabanis, Constantine Nugent, David Thangavelu, Amudha Theophilou, Georgios De Jong, Diederick |
author_facet | Laios, Alexandros Kalampokis, Evangelos Mamalis, Marios Evangelos Tarabanis, Constantine Nugent, David Thangavelu, Amudha Theophilou, Georgios De Jong, Diederick |
author_sort | Laios, Alexandros |
collection | PubMed |
description | INTRODUCTION: Contemporary efforts to predict surgical outcomes focus on the associations between traditional discrete surgical risk factors. We aimed to determine whether natural language processing (NLP) of unstructured operative notes improves the prediction of residual disease in women with advanced epithelial ovarian cancer (EOC) following cytoreductive surgery. METHODS: Electronic Health Records were queried to identify women with advanced EOC including their operative notes. The Term Frequency – Inverse Document Frequency (TF-IDF) score was used to quantify the discrimination capacity of sequences of words (n-grams) regarding the existence of residual disease. We employed the state-of-the-art RoBERTa-based classifier to process unstructured surgical notes. Discrimination was measured using standard performance metrics. An XGBoost model was then trained on the same dataset using both discrete and engineered clinical features along with the probabilities outputted by the RoBERTa classifier. RESULTS: The cohort consisted of 555 cases of EOC cytoreduction performed by eight surgeons between January 2014 and December 2019. Discrete word clouds weighted by n-gram TF-IDF score difference between R0 and non-R0 resection were identified. The words ‘adherent’ and ‘miliary disease’ best discriminated between the two groups. The RoBERTa model reached high evaluation metrics (AUROC .86; AUPRC .87, precision, recall, and F1 score of .77 and accuracy of .81). Equally, it outperformed models that used discrete clinical and engineered features and outplayed the performance of other state-of-the-art NLP tools. When the probabilities from the RoBERTa classifier were combined with commonly used predictors in the XGBoost model, a marginal improvement in the overall model’s performance was observed (AUROC and AUPRC of .91, with all other metrics the same). CONCLUSION/IMPLICATIONS: We applied a sui generis approach to extract information from the abundant textual surgical data and demonstrated how it can be effectively used for classification prediction, outperforming models relying on conventional structured data. State-of-art NLP applications in biomedical texts can improve modern EOC care. |
format | Online Article Text |
id | pubmed-10624075 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | SAGE Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-106240752023-11-04 RoBERTa-Assisted Outcome Prediction in Ovarian Cancer Cytoreductive Surgery Using Operative Notes Laios, Alexandros Kalampokis, Evangelos Mamalis, Marios Evangelos Tarabanis, Constantine Nugent, David Thangavelu, Amudha Theophilou, Georgios De Jong, Diederick Cancer Control An Inventory of Epithelial Ovarian Cancer Targets: “Evidence-Based” Options INTRODUCTION: Contemporary efforts to predict surgical outcomes focus on the associations between traditional discrete surgical risk factors. We aimed to determine whether natural language processing (NLP) of unstructured operative notes improves the prediction of residual disease in women with advanced epithelial ovarian cancer (EOC) following cytoreductive surgery. METHODS: Electronic Health Records were queried to identify women with advanced EOC including their operative notes. The Term Frequency – Inverse Document Frequency (TF-IDF) score was used to quantify the discrimination capacity of sequences of words (n-grams) regarding the existence of residual disease. We employed the state-of-the-art RoBERTa-based classifier to process unstructured surgical notes. Discrimination was measured using standard performance metrics. An XGBoost model was then trained on the same dataset using both discrete and engineered clinical features along with the probabilities outputted by the RoBERTa classifier. RESULTS: The cohort consisted of 555 cases of EOC cytoreduction performed by eight surgeons between January 2014 and December 2019. Discrete word clouds weighted by n-gram TF-IDF score difference between R0 and non-R0 resection were identified. The words ‘adherent’ and ‘miliary disease’ best discriminated between the two groups. The RoBERTa model reached high evaluation metrics (AUROC .86; AUPRC .87, precision, recall, and F1 score of .77 and accuracy of .81). Equally, it outperformed models that used discrete clinical and engineered features and outplayed the performance of other state-of-the-art NLP tools. When the probabilities from the RoBERTa classifier were combined with commonly used predictors in the XGBoost model, a marginal improvement in the overall model’s performance was observed (AUROC and AUPRC of .91, with all other metrics the same). CONCLUSION/IMPLICATIONS: We applied a sui generis approach to extract information from the abundant textual surgical data and demonstrated how it can be effectively used for classification prediction, outperforming models relying on conventional structured data. State-of-art NLP applications in biomedical texts can improve modern EOC care. SAGE Publications 2023-11-01 /pmc/articles/PMC10624075/ /pubmed/37915208 http://dx.doi.org/10.1177/10732748231209892 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by-nc/4.0/This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (https://creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage). |
spellingShingle | An Inventory of Epithelial Ovarian Cancer Targets: “Evidence-Based” Options Laios, Alexandros Kalampokis, Evangelos Mamalis, Marios Evangelos Tarabanis, Constantine Nugent, David Thangavelu, Amudha Theophilou, Georgios De Jong, Diederick RoBERTa-Assisted Outcome Prediction in Ovarian Cancer Cytoreductive Surgery Using Operative Notes |
title | RoBERTa-Assisted Outcome Prediction in Ovarian Cancer Cytoreductive Surgery Using Operative Notes |
title_full | RoBERTa-Assisted Outcome Prediction in Ovarian Cancer Cytoreductive Surgery Using Operative Notes |
title_fullStr | RoBERTa-Assisted Outcome Prediction in Ovarian Cancer Cytoreductive Surgery Using Operative Notes |
title_full_unstemmed | RoBERTa-Assisted Outcome Prediction in Ovarian Cancer Cytoreductive Surgery Using Operative Notes |
title_short | RoBERTa-Assisted Outcome Prediction in Ovarian Cancer Cytoreductive Surgery Using Operative Notes |
title_sort | roberta-assisted outcome prediction in ovarian cancer cytoreductive surgery using operative notes |
topic | An Inventory of Epithelial Ovarian Cancer Targets: “Evidence-Based” Options |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10624075/ https://www.ncbi.nlm.nih.gov/pubmed/37915208 http://dx.doi.org/10.1177/10732748231209892 |
work_keys_str_mv | AT laiosalexandros robertaassistedoutcomepredictioninovariancancercytoreductivesurgeryusingoperativenotes AT kalampokisevangelos robertaassistedoutcomepredictioninovariancancercytoreductivesurgeryusingoperativenotes AT mamalismariosevangelos robertaassistedoutcomepredictioninovariancancercytoreductivesurgeryusingoperativenotes AT tarabanisconstantine robertaassistedoutcomepredictioninovariancancercytoreductivesurgeryusingoperativenotes AT nugentdavid robertaassistedoutcomepredictioninovariancancercytoreductivesurgeryusingoperativenotes AT thangaveluamudha robertaassistedoutcomepredictioninovariancancercytoreductivesurgeryusingoperativenotes AT theophilougeorgios robertaassistedoutcomepredictioninovariancancercytoreductivesurgeryusingoperativenotes AT dejongdiederick robertaassistedoutcomepredictioninovariancancercytoreductivesurgeryusingoperativenotes |