Cargando…
Predicting health-related quality of life change using natural language processing in thyroid cancer
BACKGROUND: Patient-reported outcomes (PRO) allow clinicians to measure health-related quality of life (HRQOL) and understand patients’ treatment priorities, but obtaining PRO requires surveys which are not part of routine care. We aimed to develop a preliminary natural language processing (NLP) pip...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10473865/ https://www.ncbi.nlm.nih.gov/pubmed/37664403 http://dx.doi.org/10.1016/j.ibmed.2023.100097 |
_version_ | 1785100355877470208 |
---|---|
author | Lian, Ruixue Hsiao, Vivian Hwang, Juwon Ou, Yue Robbins, Sarah E. Connor, Nadine P. Macdonald, Cameron L. Sippel, Rebecca S. Sethares, William A. Schneider, David F. |
author_facet | Lian, Ruixue Hsiao, Vivian Hwang, Juwon Ou, Yue Robbins, Sarah E. Connor, Nadine P. Macdonald, Cameron L. Sippel, Rebecca S. Sethares, William A. Schneider, David F. |
author_sort | Lian, Ruixue |
collection | PubMed |
description | BACKGROUND: Patient-reported outcomes (PRO) allow clinicians to measure health-related quality of life (HRQOL) and understand patients’ treatment priorities, but obtaining PRO requires surveys which are not part of routine care. We aimed to develop a preliminary natural language processing (NLP) pipeline to extract HRQOL trajectory based on deep learning models using patient language. MATERIALS AND METHODS: Our data consisted of transcribed interviews of 100 patients undergoing surgical intervention for low-risk thyroid cancer, paired with HRQOL assessments completed during the same visits. Our outcome measure was HRQOL trajectory measured by the SF-12 physical and mental component scores (PCS and MCS), and average THYCA-QoL score. We constructed an NLP pipeline based on BERT, a modern deep language model that captures context semantics, to predict HRQOL trajectory as measured by the above endpoints. We compared this to baseline models using logistic regression and support vector machines trained on bag-of-words representations of transcripts obtained using Linguistic Inquiry and Word Count (LIWC). Finally, given the modest dataset size, we implemented two data augmentation methods to improve performance: first by generating synthetic samples via GPT-2, and second by changing the representation of available data via sequence-by-sequence pairing, which is a novel approach. RESULTS: A BERT-based deep learning model, with GPT-2 synthetic sample augmentation, demonstrated an area-under-curve of 76.3% in the classification of HRQOL accuracy as measured by PCS, compared to the baseline logistic regression and bag-of-words model, which had an AUC of 59.9%. The sequence-by-sequence pairing method for augmentation had an AUC of 71.2% when used with the BERT model. CONCLUSIONS: NLP methods show promise in extracting PRO from unstructured narrative data, and in the future may aid in assessing and forecasting patients’ HRQOL in response to medical treatments. Our experiments with optimization methods suggest larger amounts of novel data would further improve performance of the classification model. |
format | Online Article Text |
id | pubmed-10473865 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
record_format | MEDLINE/PubMed |
spelling | pubmed-104738652023-09-01 Predicting health-related quality of life change using natural language processing in thyroid cancer Lian, Ruixue Hsiao, Vivian Hwang, Juwon Ou, Yue Robbins, Sarah E. Connor, Nadine P. Macdonald, Cameron L. Sippel, Rebecca S. Sethares, William A. Schneider, David F. Intell Based Med Article BACKGROUND: Patient-reported outcomes (PRO) allow clinicians to measure health-related quality of life (HRQOL) and understand patients’ treatment priorities, but obtaining PRO requires surveys which are not part of routine care. We aimed to develop a preliminary natural language processing (NLP) pipeline to extract HRQOL trajectory based on deep learning models using patient language. MATERIALS AND METHODS: Our data consisted of transcribed interviews of 100 patients undergoing surgical intervention for low-risk thyroid cancer, paired with HRQOL assessments completed during the same visits. Our outcome measure was HRQOL trajectory measured by the SF-12 physical and mental component scores (PCS and MCS), and average THYCA-QoL score. We constructed an NLP pipeline based on BERT, a modern deep language model that captures context semantics, to predict HRQOL trajectory as measured by the above endpoints. We compared this to baseline models using logistic regression and support vector machines trained on bag-of-words representations of transcripts obtained using Linguistic Inquiry and Word Count (LIWC). Finally, given the modest dataset size, we implemented two data augmentation methods to improve performance: first by generating synthetic samples via GPT-2, and second by changing the representation of available data via sequence-by-sequence pairing, which is a novel approach. RESULTS: A BERT-based deep learning model, with GPT-2 synthetic sample augmentation, demonstrated an area-under-curve of 76.3% in the classification of HRQOL accuracy as measured by PCS, compared to the baseline logistic regression and bag-of-words model, which had an AUC of 59.9%. The sequence-by-sequence pairing method for augmentation had an AUC of 71.2% when used with the BERT model. CONCLUSIONS: NLP methods show promise in extracting PRO from unstructured narrative data, and in the future may aid in assessing and forecasting patients’ HRQOL in response to medical treatments. Our experiments with optimization methods suggest larger amounts of novel data would further improve performance of the classification model. 2023 2023-03-15 /pmc/articles/PMC10473865/ /pubmed/37664403 http://dx.doi.org/10.1016/j.ibmed.2023.100097 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) ). |
spellingShingle | Article Lian, Ruixue Hsiao, Vivian Hwang, Juwon Ou, Yue Robbins, Sarah E. Connor, Nadine P. Macdonald, Cameron L. Sippel, Rebecca S. Sethares, William A. Schneider, David F. Predicting health-related quality of life change using natural language processing in thyroid cancer |
title | Predicting health-related quality of life change using natural language processing in thyroid cancer |
title_full | Predicting health-related quality of life change using natural language processing in thyroid cancer |
title_fullStr | Predicting health-related quality of life change using natural language processing in thyroid cancer |
title_full_unstemmed | Predicting health-related quality of life change using natural language processing in thyroid cancer |
title_short | Predicting health-related quality of life change using natural language processing in thyroid cancer |
title_sort | predicting health-related quality of life change using natural language processing in thyroid cancer |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10473865/ https://www.ncbi.nlm.nih.gov/pubmed/37664403 http://dx.doi.org/10.1016/j.ibmed.2023.100097 |
work_keys_str_mv | AT lianruixue predictinghealthrelatedqualityoflifechangeusingnaturallanguageprocessinginthyroidcancer AT hsiaovivian predictinghealthrelatedqualityoflifechangeusingnaturallanguageprocessinginthyroidcancer AT hwangjuwon predictinghealthrelatedqualityoflifechangeusingnaturallanguageprocessinginthyroidcancer AT ouyue predictinghealthrelatedqualityoflifechangeusingnaturallanguageprocessinginthyroidcancer AT robbinssarahe predictinghealthrelatedqualityoflifechangeusingnaturallanguageprocessinginthyroidcancer AT connornadinep predictinghealthrelatedqualityoflifechangeusingnaturallanguageprocessinginthyroidcancer AT macdonaldcameronl predictinghealthrelatedqualityoflifechangeusingnaturallanguageprocessinginthyroidcancer AT sippelrebeccas predictinghealthrelatedqualityoflifechangeusingnaturallanguageprocessinginthyroidcancer AT sethareswilliama predictinghealthrelatedqualityoflifechangeusingnaturallanguageprocessinginthyroidcancer AT schneiderdavidf predictinghealthrelatedqualityoflifechangeusingnaturallanguageprocessinginthyroidcancer |