Cargando…
Optimization on machine learning based approaches for sentiment analysis on HPV vaccines related tweets
BACKGROUND: Analysing public opinions on HPV vaccines on social media using machine learning based approaches will help us understand the reasons behind the low vaccine coverage and come up with corresponding strategies to improve vaccine uptake. OBJECTIVE: To propose a machine learning system that...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5335787/ https://www.ncbi.nlm.nih.gov/pubmed/28253919 http://dx.doi.org/10.1186/s13326-017-0120-6 |
_version_ | 1782512105717497856 |
---|---|
author | Du, Jingcheng Xu, Jun Song, Hsingyi Liu, Xiangyu Tao, Cui |
author_facet | Du, Jingcheng Xu, Jun Song, Hsingyi Liu, Xiangyu Tao, Cui |
author_sort | Du, Jingcheng |
collection | PubMed |
description | BACKGROUND: Analysing public opinions on HPV vaccines on social media using machine learning based approaches will help us understand the reasons behind the low vaccine coverage and come up with corresponding strategies to improve vaccine uptake. OBJECTIVE: To propose a machine learning system that is able to extract comprehensive public sentiment on HPV vaccines on Twitter with satisfying performance. METHOD: We collected and manually annotated 6,000 HPV vaccines related tweets as a gold standard. SVM model was chosen and a hierarchical classification method was proposed and evaluated. Additional feature sets evaluation and model parameters optimization was done to maximize the machine learning model performance. RESULTS: A hierarchical classification scheme that contains 10 categories was built to access public opinions toward HPV vaccines comprehensively. A 6,000 annotated tweets gold corpus with Kappa annotation agreement at 0.851 was created and made public available. The hierarchical classification model with optimized feature sets and model parameters has increased the micro-averaging and macro-averaging F score from 0.6732 and 0.3967 to 0.7442 and 0.5883 respectively, compared with baseline model. CONCLUSIONS: Our work provides a systematical way to improve the machine learning model performance on the highly unbalanced HPV vaccines related tweets corpus. Our system can be further applied on a large tweets corpus to extract large-scale public opinion towards HPV vaccines. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13326-017-0120-6) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5335787 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-53357872017-03-07 Optimization on machine learning based approaches for sentiment analysis on HPV vaccines related tweets Du, Jingcheng Xu, Jun Song, Hsingyi Liu, Xiangyu Tao, Cui J Biomed Semantics Research BACKGROUND: Analysing public opinions on HPV vaccines on social media using machine learning based approaches will help us understand the reasons behind the low vaccine coverage and come up with corresponding strategies to improve vaccine uptake. OBJECTIVE: To propose a machine learning system that is able to extract comprehensive public sentiment on HPV vaccines on Twitter with satisfying performance. METHOD: We collected and manually annotated 6,000 HPV vaccines related tweets as a gold standard. SVM model was chosen and a hierarchical classification method was proposed and evaluated. Additional feature sets evaluation and model parameters optimization was done to maximize the machine learning model performance. RESULTS: A hierarchical classification scheme that contains 10 categories was built to access public opinions toward HPV vaccines comprehensively. A 6,000 annotated tweets gold corpus with Kappa annotation agreement at 0.851 was created and made public available. The hierarchical classification model with optimized feature sets and model parameters has increased the micro-averaging and macro-averaging F score from 0.6732 and 0.3967 to 0.7442 and 0.5883 respectively, compared with baseline model. CONCLUSIONS: Our work provides a systematical way to improve the machine learning model performance on the highly unbalanced HPV vaccines related tweets corpus. Our system can be further applied on a large tweets corpus to extract large-scale public opinion towards HPV vaccines. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13326-017-0120-6) contains supplementary material, which is available to authorized users. BioMed Central 2017-03-03 /pmc/articles/PMC5335787/ /pubmed/28253919 http://dx.doi.org/10.1186/s13326-017-0120-6 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Du, Jingcheng Xu, Jun Song, Hsingyi Liu, Xiangyu Tao, Cui Optimization on machine learning based approaches for sentiment analysis on HPV vaccines related tweets |
title | Optimization on machine learning based approaches for sentiment analysis on HPV vaccines related tweets |
title_full | Optimization on machine learning based approaches for sentiment analysis on HPV vaccines related tweets |
title_fullStr | Optimization on machine learning based approaches for sentiment analysis on HPV vaccines related tweets |
title_full_unstemmed | Optimization on machine learning based approaches for sentiment analysis on HPV vaccines related tweets |
title_short | Optimization on machine learning based approaches for sentiment analysis on HPV vaccines related tweets |
title_sort | optimization on machine learning based approaches for sentiment analysis on hpv vaccines related tweets |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5335787/ https://www.ncbi.nlm.nih.gov/pubmed/28253919 http://dx.doi.org/10.1186/s13326-017-0120-6 |
work_keys_str_mv | AT dujingcheng optimizationonmachinelearningbasedapproachesforsentimentanalysisonhpvvaccinesrelatedtweets AT xujun optimizationonmachinelearningbasedapproachesforsentimentanalysisonhpvvaccinesrelatedtweets AT songhsingyi optimizationonmachinelearningbasedapproachesforsentimentanalysisonhpvvaccinesrelatedtweets AT liuxiangyu optimizationonmachinelearningbasedapproachesforsentimentanalysisonhpvvaccinesrelatedtweets AT taocui optimizationonmachinelearningbasedapproachesforsentimentanalysisonhpvvaccinesrelatedtweets |