Cargando…

Classifying patient and professional voice in social media health posts

BACKGROUND: Patient-based analysis of social media is a growing research field with the aim of delivering precision medicine but it requires accurate classification of posts relating to patients’ experiences. We motivate the need for this type of classification as a pre-processing step for further a...

Descripción completa

Detalles Bibliográficos
Autores principales: Alex, Beatrice, Whyte, Donald, Duma, Daniel, Owen, Roma English, Fairley, Elizabeth A. L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8371035/
https://www.ncbi.nlm.nih.gov/pubmed/34407807
http://dx.doi.org/10.1186/s12911-021-01577-9
_version_ 1783739558771294208
author Alex, Beatrice
Whyte, Donald
Duma, Daniel
Owen, Roma English
Fairley, Elizabeth A. L.
author_facet Alex, Beatrice
Whyte, Donald
Duma, Daniel
Owen, Roma English
Fairley, Elizabeth A. L.
author_sort Alex, Beatrice
collection PubMed
description BACKGROUND: Patient-based analysis of social media is a growing research field with the aim of delivering precision medicine but it requires accurate classification of posts relating to patients’ experiences. We motivate the need for this type of classification as a pre-processing step for further analysis of social media data in the context of related work in this area. In this paper we present experiments for a three-way document classification by patient voice, professional voice or other. We present results for a convolutional neural network classifier trained on English data from two different data sources (Reddit and Twitter) and two domains (cardiovascular and skin diseases). RESULTS: We found that document classification by patient voice, professional voice or other can be done consistently manually (0.92 accuracy). Annotators agreed roughly equally for each domain (cardiovascular and skin) but they agreed more when annotating Reddit posts compared to Twitter posts. Best classification performance was obtained when training two separate classifiers for each data source, one for Reddit and one for Twitter posts, when evaluating on in-source test data for both test sets combined with an overall accuracy of 0.95 (and macro-average F1 of 0.92) and an F1-score of 0.95 for patient voice only. CONCLUSION: The main conclusion resulting from this work is that combining social media data from platforms with different characteristics for training a patient and professional voice classifier does not result in best possible performance. We showed that it is best to train separate models per data source (Reddit and Twitter) instead of a model using the combined training data from both sources. We also found that it is preferable to train separate models per domain (cardiovascular and skin) while showing that the difference to the combined model is only minor (0.01 accuracy). Our highest overall F1-score (0.95) obtained for classifying posts as patient voice is a very good starting point for further analysis of social media data reflecting the experience of patients. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-021-01577-9.
format Online
Article
Text
id pubmed-8371035
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-83710352021-08-18 Classifying patient and professional voice in social media health posts Alex, Beatrice Whyte, Donald Duma, Daniel Owen, Roma English Fairley, Elizabeth A. L. BMC Med Inform Decis Mak Research BACKGROUND: Patient-based analysis of social media is a growing research field with the aim of delivering precision medicine but it requires accurate classification of posts relating to patients’ experiences. We motivate the need for this type of classification as a pre-processing step for further analysis of social media data in the context of related work in this area. In this paper we present experiments for a three-way document classification by patient voice, professional voice or other. We present results for a convolutional neural network classifier trained on English data from two different data sources (Reddit and Twitter) and two domains (cardiovascular and skin diseases). RESULTS: We found that document classification by patient voice, professional voice or other can be done consistently manually (0.92 accuracy). Annotators agreed roughly equally for each domain (cardiovascular and skin) but they agreed more when annotating Reddit posts compared to Twitter posts. Best classification performance was obtained when training two separate classifiers for each data source, one for Reddit and one for Twitter posts, when evaluating on in-source test data for both test sets combined with an overall accuracy of 0.95 (and macro-average F1 of 0.92) and an F1-score of 0.95 for patient voice only. CONCLUSION: The main conclusion resulting from this work is that combining social media data from platforms with different characteristics for training a patient and professional voice classifier does not result in best possible performance. We showed that it is best to train separate models per data source (Reddit and Twitter) instead of a model using the combined training data from both sources. We also found that it is preferable to train separate models per domain (cardiovascular and skin) while showing that the difference to the combined model is only minor (0.01 accuracy). Our highest overall F1-score (0.95) obtained for classifying posts as patient voice is a very good starting point for further analysis of social media data reflecting the experience of patients. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-021-01577-9. BioMed Central 2021-08-18 /pmc/articles/PMC8371035/ /pubmed/34407807 http://dx.doi.org/10.1186/s12911-021-01577-9 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Alex, Beatrice
Whyte, Donald
Duma, Daniel
Owen, Roma English
Fairley, Elizabeth A. L.
Classifying patient and professional voice in social media health posts
title Classifying patient and professional voice in social media health posts
title_full Classifying patient and professional voice in social media health posts
title_fullStr Classifying patient and professional voice in social media health posts
title_full_unstemmed Classifying patient and professional voice in social media health posts
title_short Classifying patient and professional voice in social media health posts
title_sort classifying patient and professional voice in social media health posts
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8371035/
https://www.ncbi.nlm.nih.gov/pubmed/34407807
http://dx.doi.org/10.1186/s12911-021-01577-9
work_keys_str_mv AT alexbeatrice classifyingpatientandprofessionalvoiceinsocialmediahealthposts
AT whytedonald classifyingpatientandprofessionalvoiceinsocialmediahealthposts
AT dumadaniel classifyingpatientandprofessionalvoiceinsocialmediahealthposts
AT owenromaenglish classifyingpatientandprofessionalvoiceinsocialmediahealthposts
AT fairleyelizabethal classifyingpatientandprofessionalvoiceinsocialmediahealthposts