Cargando…

Extracting Multiple Worries From Breast Cancer Patient Blogs Using Multilabel Classification With the Natural Language Processing Model Bidirectional Encoder Representations From Transformers: Infodemiology Study of Blogs

BACKGROUND: Patients with breast cancer have a variety of worries and need multifaceted information support. Their accumulated posts on social media contain rich descriptions of their daily worries concerning issues such as treatment, family, and finances. It is important to identify these issues to...

Descripción completa

Detalles Bibliográficos
Autores principales: Watanabe, Tomomi, Yada, Shuntaro, Aramaki, Eiji, Yajima, Hiroshi, Kizaki, Hayato, Hori, Satoko
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9206207/
https://www.ncbi.nlm.nih.gov/pubmed/35657664
http://dx.doi.org/10.2196/37840
_version_ 1784729289198927872
author Watanabe, Tomomi
Yada, Shuntaro
Aramaki, Eiji
Yajima, Hiroshi
Kizaki, Hayato
Hori, Satoko
author_facet Watanabe, Tomomi
Yada, Shuntaro
Aramaki, Eiji
Yajima, Hiroshi
Kizaki, Hayato
Hori, Satoko
author_sort Watanabe, Tomomi
collection PubMed
description BACKGROUND: Patients with breast cancer have a variety of worries and need multifaceted information support. Their accumulated posts on social media contain rich descriptions of their daily worries concerning issues such as treatment, family, and finances. It is important to identify these issues to help patients with breast cancer to resolve their worries and obtain reliable information. OBJECTIVE: This study aimed to extract and classify multiple worries from text generated by patients with breast cancer using Bidirectional Encoder Representations From Transformers (BERT), a context-aware natural language processing model. METHODS: A total of 2272 blog posts by patients with breast cancer in Japan were collected. Five worry labels, “treatment,” “physical,” “psychological,” “work/financial,” and “family/friends,” were defined and assigned to each post. Multiple labels were allowed. To assess the label criteria, 50 blog posts were randomly selected and annotated by two researchers with medical knowledge. After the interannotator agreement had been assessed by means of Cohen kappa, one researcher annotated all the blogs. A multilabel classifier that simultaneously predicts five worries in a text was developed using BERT. This classifier was fine-tuned by using the posts as input and adding a classification layer to the pretrained BERT. The performance was evaluated for precision using the average of 5-fold cross-validation results. RESULTS: Among the blog posts, 477 included “treatment,” 1138 included “physical,” 673 included “psychological,” 312 included “work/financial,” and 283 included “family/friends.” The interannotator agreement values were 0.67 for “treatment,” 0.76 for “physical,” 0.56 for “psychological,” 0.73 for “work/financial,” and 0.73 for “family/friends,” indicating a high degree of agreement. Among all blog posts, 544 contained no label, 892 contained one label, and 836 contained multiple labels. It was found that the worries varied from user to user, and the worries posted by the same user changed over time. The model performed well, though prediction performance differed for each label. The values of precision were 0.59 for “treatment,” 0.82 for “physical,” 0.64 for “psychological,” 0.67 for “work/financial,” and 0.58 for “family/friends.” The higher the interannotator agreement and the greater the number of posts, the higher the precision tended to be. CONCLUSIONS: This study showed that the BERT model can extract multiple worries from text generated from patients with breast cancer. This is the first application of a multilabel classifier using the BERT model to extract multiple worries from patient-generated text. The results will be helpful to identify breast cancer patients’ worries and give them timely social support.
format Online
Article
Text
id pubmed-9206207
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-92062072022-06-19 Extracting Multiple Worries From Breast Cancer Patient Blogs Using Multilabel Classification With the Natural Language Processing Model Bidirectional Encoder Representations From Transformers: Infodemiology Study of Blogs Watanabe, Tomomi Yada, Shuntaro Aramaki, Eiji Yajima, Hiroshi Kizaki, Hayato Hori, Satoko JMIR Cancer Original Paper BACKGROUND: Patients with breast cancer have a variety of worries and need multifaceted information support. Their accumulated posts on social media contain rich descriptions of their daily worries concerning issues such as treatment, family, and finances. It is important to identify these issues to help patients with breast cancer to resolve their worries and obtain reliable information. OBJECTIVE: This study aimed to extract and classify multiple worries from text generated by patients with breast cancer using Bidirectional Encoder Representations From Transformers (BERT), a context-aware natural language processing model. METHODS: A total of 2272 blog posts by patients with breast cancer in Japan were collected. Five worry labels, “treatment,” “physical,” “psychological,” “work/financial,” and “family/friends,” were defined and assigned to each post. Multiple labels were allowed. To assess the label criteria, 50 blog posts were randomly selected and annotated by two researchers with medical knowledge. After the interannotator agreement had been assessed by means of Cohen kappa, one researcher annotated all the blogs. A multilabel classifier that simultaneously predicts five worries in a text was developed using BERT. This classifier was fine-tuned by using the posts as input and adding a classification layer to the pretrained BERT. The performance was evaluated for precision using the average of 5-fold cross-validation results. RESULTS: Among the blog posts, 477 included “treatment,” 1138 included “physical,” 673 included “psychological,” 312 included “work/financial,” and 283 included “family/friends.” The interannotator agreement values were 0.67 for “treatment,” 0.76 for “physical,” 0.56 for “psychological,” 0.73 for “work/financial,” and 0.73 for “family/friends,” indicating a high degree of agreement. Among all blog posts, 544 contained no label, 892 contained one label, and 836 contained multiple labels. It was found that the worries varied from user to user, and the worries posted by the same user changed over time. The model performed well, though prediction performance differed for each label. The values of precision were 0.59 for “treatment,” 0.82 for “physical,” 0.64 for “psychological,” 0.67 for “work/financial,” and 0.58 for “family/friends.” The higher the interannotator agreement and the greater the number of posts, the higher the precision tended to be. CONCLUSIONS: This study showed that the BERT model can extract multiple worries from text generated from patients with breast cancer. This is the first application of a multilabel classifier using the BERT model to extract multiple worries from patient-generated text. The results will be helpful to identify breast cancer patients’ worries and give them timely social support. JMIR Publications 2022-06-03 /pmc/articles/PMC9206207/ /pubmed/35657664 http://dx.doi.org/10.2196/37840 Text en ©Tomomi Watanabe, Shuntaro Yada, Eiji Aramaki, Hiroshi Yajima, Hayato Kizaki, Satoko Hori. Originally published in JMIR Cancer (https://cancer.jmir.org), 03.06.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Cancer, is properly cited. The complete bibliographic information, a link to the original publication on https://cancer.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Watanabe, Tomomi
Yada, Shuntaro
Aramaki, Eiji
Yajima, Hiroshi
Kizaki, Hayato
Hori, Satoko
Extracting Multiple Worries From Breast Cancer Patient Blogs Using Multilabel Classification With the Natural Language Processing Model Bidirectional Encoder Representations From Transformers: Infodemiology Study of Blogs
title Extracting Multiple Worries From Breast Cancer Patient Blogs Using Multilabel Classification With the Natural Language Processing Model Bidirectional Encoder Representations From Transformers: Infodemiology Study of Blogs
title_full Extracting Multiple Worries From Breast Cancer Patient Blogs Using Multilabel Classification With the Natural Language Processing Model Bidirectional Encoder Representations From Transformers: Infodemiology Study of Blogs
title_fullStr Extracting Multiple Worries From Breast Cancer Patient Blogs Using Multilabel Classification With the Natural Language Processing Model Bidirectional Encoder Representations From Transformers: Infodemiology Study of Blogs
title_full_unstemmed Extracting Multiple Worries From Breast Cancer Patient Blogs Using Multilabel Classification With the Natural Language Processing Model Bidirectional Encoder Representations From Transformers: Infodemiology Study of Blogs
title_short Extracting Multiple Worries From Breast Cancer Patient Blogs Using Multilabel Classification With the Natural Language Processing Model Bidirectional Encoder Representations From Transformers: Infodemiology Study of Blogs
title_sort extracting multiple worries from breast cancer patient blogs using multilabel classification with the natural language processing model bidirectional encoder representations from transformers: infodemiology study of blogs
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9206207/
https://www.ncbi.nlm.nih.gov/pubmed/35657664
http://dx.doi.org/10.2196/37840
work_keys_str_mv AT watanabetomomi extractingmultipleworriesfrombreastcancerpatientblogsusingmultilabelclassificationwiththenaturallanguageprocessingmodelbidirectionalencoderrepresentationsfromtransformersinfodemiologystudyofblogs
AT yadashuntaro extractingmultipleworriesfrombreastcancerpatientblogsusingmultilabelclassificationwiththenaturallanguageprocessingmodelbidirectionalencoderrepresentationsfromtransformersinfodemiologystudyofblogs
AT aramakieiji extractingmultipleworriesfrombreastcancerpatientblogsusingmultilabelclassificationwiththenaturallanguageprocessingmodelbidirectionalencoderrepresentationsfromtransformersinfodemiologystudyofblogs
AT yajimahiroshi extractingmultipleworriesfrombreastcancerpatientblogsusingmultilabelclassificationwiththenaturallanguageprocessingmodelbidirectionalencoderrepresentationsfromtransformersinfodemiologystudyofblogs
AT kizakihayato extractingmultipleworriesfrombreastcancerpatientblogsusingmultilabelclassificationwiththenaturallanguageprocessingmodelbidirectionalencoderrepresentationsfromtransformersinfodemiologystudyofblogs
AT horisatoko extractingmultipleworriesfrombreastcancerpatientblogsusingmultilabelclassificationwiththenaturallanguageprocessingmodelbidirectionalencoderrepresentationsfromtransformersinfodemiologystudyofblogs