Cargando…

Predicting health-related quality of life change using natural language processing in thyroid cancer

BACKGROUND: Patient-reported outcomes (PRO) allow clinicians to measure health-related quality of life (HRQOL) and understand patients’ treatment priorities, but obtaining PRO requires surveys which are not part of routine care. We aimed to develop a preliminary natural language processing (NLP) pip...

Descripción completa

Detalles Bibliográficos
Autores principales: Lian, Ruixue, Hsiao, Vivian, Hwang, Juwon, Ou, Yue, Robbins, Sarah E., Connor, Nadine P., Macdonald, Cameron L., Sippel, Rebecca S., Sethares, William A., Schneider, David F.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10473865/
https://www.ncbi.nlm.nih.gov/pubmed/37664403
http://dx.doi.org/10.1016/j.ibmed.2023.100097
_version_ 1785100355877470208
author Lian, Ruixue
Hsiao, Vivian
Hwang, Juwon
Ou, Yue
Robbins, Sarah E.
Connor, Nadine P.
Macdonald, Cameron L.
Sippel, Rebecca S.
Sethares, William A.
Schneider, David F.
author_facet Lian, Ruixue
Hsiao, Vivian
Hwang, Juwon
Ou, Yue
Robbins, Sarah E.
Connor, Nadine P.
Macdonald, Cameron L.
Sippel, Rebecca S.
Sethares, William A.
Schneider, David F.
author_sort Lian, Ruixue
collection PubMed
description BACKGROUND: Patient-reported outcomes (PRO) allow clinicians to measure health-related quality of life (HRQOL) and understand patients’ treatment priorities, but obtaining PRO requires surveys which are not part of routine care. We aimed to develop a preliminary natural language processing (NLP) pipeline to extract HRQOL trajectory based on deep learning models using patient language. MATERIALS AND METHODS: Our data consisted of transcribed interviews of 100 patients undergoing surgical intervention for low-risk thyroid cancer, paired with HRQOL assessments completed during the same visits. Our outcome measure was HRQOL trajectory measured by the SF-12 physical and mental component scores (PCS and MCS), and average THYCA-QoL score. We constructed an NLP pipeline based on BERT, a modern deep language model that captures context semantics, to predict HRQOL trajectory as measured by the above endpoints. We compared this to baseline models using logistic regression and support vector machines trained on bag-of-words representations of transcripts obtained using Linguistic Inquiry and Word Count (LIWC). Finally, given the modest dataset size, we implemented two data augmentation methods to improve performance: first by generating synthetic samples via GPT-2, and second by changing the representation of available data via sequence-by-sequence pairing, which is a novel approach. RESULTS: A BERT-based deep learning model, with GPT-2 synthetic sample augmentation, demonstrated an area-under-curve of 76.3% in the classification of HRQOL accuracy as measured by PCS, compared to the baseline logistic regression and bag-of-words model, which had an AUC of 59.9%. The sequence-by-sequence pairing method for augmentation had an AUC of 71.2% when used with the BERT model. CONCLUSIONS: NLP methods show promise in extracting PRO from unstructured narrative data, and in the future may aid in assessing and forecasting patients’ HRQOL in response to medical treatments. Our experiments with optimization methods suggest larger amounts of novel data would further improve performance of the classification model.
format Online
Article
Text
id pubmed-10473865
institution National Center for Biotechnology Information
language English
publishDate 2023
record_format MEDLINE/PubMed
spelling pubmed-104738652023-09-01 Predicting health-related quality of life change using natural language processing in thyroid cancer Lian, Ruixue Hsiao, Vivian Hwang, Juwon Ou, Yue Robbins, Sarah E. Connor, Nadine P. Macdonald, Cameron L. Sippel, Rebecca S. Sethares, William A. Schneider, David F. Intell Based Med Article BACKGROUND: Patient-reported outcomes (PRO) allow clinicians to measure health-related quality of life (HRQOL) and understand patients’ treatment priorities, but obtaining PRO requires surveys which are not part of routine care. We aimed to develop a preliminary natural language processing (NLP) pipeline to extract HRQOL trajectory based on deep learning models using patient language. MATERIALS AND METHODS: Our data consisted of transcribed interviews of 100 patients undergoing surgical intervention for low-risk thyroid cancer, paired with HRQOL assessments completed during the same visits. Our outcome measure was HRQOL trajectory measured by the SF-12 physical and mental component scores (PCS and MCS), and average THYCA-QoL score. We constructed an NLP pipeline based on BERT, a modern deep language model that captures context semantics, to predict HRQOL trajectory as measured by the above endpoints. We compared this to baseline models using logistic regression and support vector machines trained on bag-of-words representations of transcripts obtained using Linguistic Inquiry and Word Count (LIWC). Finally, given the modest dataset size, we implemented two data augmentation methods to improve performance: first by generating synthetic samples via GPT-2, and second by changing the representation of available data via sequence-by-sequence pairing, which is a novel approach. RESULTS: A BERT-based deep learning model, with GPT-2 synthetic sample augmentation, demonstrated an area-under-curve of 76.3% in the classification of HRQOL accuracy as measured by PCS, compared to the baseline logistic regression and bag-of-words model, which had an AUC of 59.9%. The sequence-by-sequence pairing method for augmentation had an AUC of 71.2% when used with the BERT model. CONCLUSIONS: NLP methods show promise in extracting PRO from unstructured narrative data, and in the future may aid in assessing and forecasting patients’ HRQOL in response to medical treatments. Our experiments with optimization methods suggest larger amounts of novel data would further improve performance of the classification model. 2023 2023-03-15 /pmc/articles/PMC10473865/ /pubmed/37664403 http://dx.doi.org/10.1016/j.ibmed.2023.100097 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) ).
spellingShingle Article
Lian, Ruixue
Hsiao, Vivian
Hwang, Juwon
Ou, Yue
Robbins, Sarah E.
Connor, Nadine P.
Macdonald, Cameron L.
Sippel, Rebecca S.
Sethares, William A.
Schneider, David F.
Predicting health-related quality of life change using natural language processing in thyroid cancer
title Predicting health-related quality of life change using natural language processing in thyroid cancer
title_full Predicting health-related quality of life change using natural language processing in thyroid cancer
title_fullStr Predicting health-related quality of life change using natural language processing in thyroid cancer
title_full_unstemmed Predicting health-related quality of life change using natural language processing in thyroid cancer
title_short Predicting health-related quality of life change using natural language processing in thyroid cancer
title_sort predicting health-related quality of life change using natural language processing in thyroid cancer
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10473865/
https://www.ncbi.nlm.nih.gov/pubmed/37664403
http://dx.doi.org/10.1016/j.ibmed.2023.100097
work_keys_str_mv AT lianruixue predictinghealthrelatedqualityoflifechangeusingnaturallanguageprocessinginthyroidcancer
AT hsiaovivian predictinghealthrelatedqualityoflifechangeusingnaturallanguageprocessinginthyroidcancer
AT hwangjuwon predictinghealthrelatedqualityoflifechangeusingnaturallanguageprocessinginthyroidcancer
AT ouyue predictinghealthrelatedqualityoflifechangeusingnaturallanguageprocessinginthyroidcancer
AT robbinssarahe predictinghealthrelatedqualityoflifechangeusingnaturallanguageprocessinginthyroidcancer
AT connornadinep predictinghealthrelatedqualityoflifechangeusingnaturallanguageprocessinginthyroidcancer
AT macdonaldcameronl predictinghealthrelatedqualityoflifechangeusingnaturallanguageprocessinginthyroidcancer
AT sippelrebeccas predictinghealthrelatedqualityoflifechangeusingnaturallanguageprocessinginthyroidcancer
AT sethareswilliama predictinghealthrelatedqualityoflifechangeusingnaturallanguageprocessinginthyroidcancer
AT schneiderdavidf predictinghealthrelatedqualityoflifechangeusingnaturallanguageprocessinginthyroidcancer