Cargando…
Classifying patient and professional voice in social media health posts
BACKGROUND: Patient-based analysis of social media is a growing research field with the aim of delivering precision medicine but it requires accurate classification of posts relating to patients’ experiences. We motivate the need for this type of classification as a pre-processing step for further a...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8371035/ https://www.ncbi.nlm.nih.gov/pubmed/34407807 http://dx.doi.org/10.1186/s12911-021-01577-9 |
_version_ | 1783739558771294208 |
---|---|
author | Alex, Beatrice Whyte, Donald Duma, Daniel Owen, Roma English Fairley, Elizabeth A. L. |
author_facet | Alex, Beatrice Whyte, Donald Duma, Daniel Owen, Roma English Fairley, Elizabeth A. L. |
author_sort | Alex, Beatrice |
collection | PubMed |
description | BACKGROUND: Patient-based analysis of social media is a growing research field with the aim of delivering precision medicine but it requires accurate classification of posts relating to patients’ experiences. We motivate the need for this type of classification as a pre-processing step for further analysis of social media data in the context of related work in this area. In this paper we present experiments for a three-way document classification by patient voice, professional voice or other. We present results for a convolutional neural network classifier trained on English data from two different data sources (Reddit and Twitter) and two domains (cardiovascular and skin diseases). RESULTS: We found that document classification by patient voice, professional voice or other can be done consistently manually (0.92 accuracy). Annotators agreed roughly equally for each domain (cardiovascular and skin) but they agreed more when annotating Reddit posts compared to Twitter posts. Best classification performance was obtained when training two separate classifiers for each data source, one for Reddit and one for Twitter posts, when evaluating on in-source test data for both test sets combined with an overall accuracy of 0.95 (and macro-average F1 of 0.92) and an F1-score of 0.95 for patient voice only. CONCLUSION: The main conclusion resulting from this work is that combining social media data from platforms with different characteristics for training a patient and professional voice classifier does not result in best possible performance. We showed that it is best to train separate models per data source (Reddit and Twitter) instead of a model using the combined training data from both sources. We also found that it is preferable to train separate models per domain (cardiovascular and skin) while showing that the difference to the combined model is only minor (0.01 accuracy). Our highest overall F1-score (0.95) obtained for classifying posts as patient voice is a very good starting point for further analysis of social media data reflecting the experience of patients. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-021-01577-9. |
format | Online Article Text |
id | pubmed-8371035 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-83710352021-08-18 Classifying patient and professional voice in social media health posts Alex, Beatrice Whyte, Donald Duma, Daniel Owen, Roma English Fairley, Elizabeth A. L. BMC Med Inform Decis Mak Research BACKGROUND: Patient-based analysis of social media is a growing research field with the aim of delivering precision medicine but it requires accurate classification of posts relating to patients’ experiences. We motivate the need for this type of classification as a pre-processing step for further analysis of social media data in the context of related work in this area. In this paper we present experiments for a three-way document classification by patient voice, professional voice or other. We present results for a convolutional neural network classifier trained on English data from two different data sources (Reddit and Twitter) and two domains (cardiovascular and skin diseases). RESULTS: We found that document classification by patient voice, professional voice or other can be done consistently manually (0.92 accuracy). Annotators agreed roughly equally for each domain (cardiovascular and skin) but they agreed more when annotating Reddit posts compared to Twitter posts. Best classification performance was obtained when training two separate classifiers for each data source, one for Reddit and one for Twitter posts, when evaluating on in-source test data for both test sets combined with an overall accuracy of 0.95 (and macro-average F1 of 0.92) and an F1-score of 0.95 for patient voice only. CONCLUSION: The main conclusion resulting from this work is that combining social media data from platforms with different characteristics for training a patient and professional voice classifier does not result in best possible performance. We showed that it is best to train separate models per data source (Reddit and Twitter) instead of a model using the combined training data from both sources. We also found that it is preferable to train separate models per domain (cardiovascular and skin) while showing that the difference to the combined model is only minor (0.01 accuracy). Our highest overall F1-score (0.95) obtained for classifying posts as patient voice is a very good starting point for further analysis of social media data reflecting the experience of patients. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-021-01577-9. BioMed Central 2021-08-18 /pmc/articles/PMC8371035/ /pubmed/34407807 http://dx.doi.org/10.1186/s12911-021-01577-9 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Alex, Beatrice Whyte, Donald Duma, Daniel Owen, Roma English Fairley, Elizabeth A. L. Classifying patient and professional voice in social media health posts |
title | Classifying patient and professional voice in social media health posts |
title_full | Classifying patient and professional voice in social media health posts |
title_fullStr | Classifying patient and professional voice in social media health posts |
title_full_unstemmed | Classifying patient and professional voice in social media health posts |
title_short | Classifying patient and professional voice in social media health posts |
title_sort | classifying patient and professional voice in social media health posts |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8371035/ https://www.ncbi.nlm.nih.gov/pubmed/34407807 http://dx.doi.org/10.1186/s12911-021-01577-9 |
work_keys_str_mv | AT alexbeatrice classifyingpatientandprofessionalvoiceinsocialmediahealthposts AT whytedonald classifyingpatientandprofessionalvoiceinsocialmediahealthposts AT dumadaniel classifyingpatientandprofessionalvoiceinsocialmediahealthposts AT owenromaenglish classifyingpatientandprofessionalvoiceinsocialmediahealthposts AT fairleyelizabethal classifyingpatientandprofessionalvoiceinsocialmediahealthposts |