Cargando…

Optimization on machine learning based approaches for sentiment analysis on HPV vaccines related tweets

BACKGROUND: Analysing public opinions on HPV vaccines on social media using machine learning based approaches will help us understand the reasons behind the low vaccine coverage and come up with corresponding strategies to improve vaccine uptake. OBJECTIVE: To propose a machine learning system that...

Descripción completa

Detalles Bibliográficos
Autores principales: Du, Jingcheng, Xu, Jun, Song, Hsingyi, Liu, Xiangyu, Tao, Cui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5335787/
https://www.ncbi.nlm.nih.gov/pubmed/28253919
http://dx.doi.org/10.1186/s13326-017-0120-6
_version_ 1782512105717497856
author Du, Jingcheng
Xu, Jun
Song, Hsingyi
Liu, Xiangyu
Tao, Cui
author_facet Du, Jingcheng
Xu, Jun
Song, Hsingyi
Liu, Xiangyu
Tao, Cui
author_sort Du, Jingcheng
collection PubMed
description BACKGROUND: Analysing public opinions on HPV vaccines on social media using machine learning based approaches will help us understand the reasons behind the low vaccine coverage and come up with corresponding strategies to improve vaccine uptake. OBJECTIVE: To propose a machine learning system that is able to extract comprehensive public sentiment on HPV vaccines on Twitter with satisfying performance. METHOD: We collected and manually annotated 6,000 HPV vaccines related tweets as a gold standard. SVM model was chosen and a hierarchical classification method was proposed and evaluated. Additional feature sets evaluation and model parameters optimization was done to maximize the machine learning model performance. RESULTS: A hierarchical classification scheme that contains 10 categories was built to access public opinions toward HPV vaccines comprehensively. A 6,000 annotated tweets gold corpus with Kappa annotation agreement at 0.851 was created and made public available. The hierarchical classification model with optimized feature sets and model parameters has increased the micro-averaging and macro-averaging F score from 0.6732 and 0.3967 to 0.7442 and 0.5883 respectively, compared with baseline model. CONCLUSIONS: Our work provides a systematical way to improve the machine learning model performance on the highly unbalanced HPV vaccines related tweets corpus. Our system can be further applied on a large tweets corpus to extract large-scale public opinion towards HPV vaccines. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13326-017-0120-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5335787
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-53357872017-03-07 Optimization on machine learning based approaches for sentiment analysis on HPV vaccines related tweets Du, Jingcheng Xu, Jun Song, Hsingyi Liu, Xiangyu Tao, Cui J Biomed Semantics Research BACKGROUND: Analysing public opinions on HPV vaccines on social media using machine learning based approaches will help us understand the reasons behind the low vaccine coverage and come up with corresponding strategies to improve vaccine uptake. OBJECTIVE: To propose a machine learning system that is able to extract comprehensive public sentiment on HPV vaccines on Twitter with satisfying performance. METHOD: We collected and manually annotated 6,000 HPV vaccines related tweets as a gold standard. SVM model was chosen and a hierarchical classification method was proposed and evaluated. Additional feature sets evaluation and model parameters optimization was done to maximize the machine learning model performance. RESULTS: A hierarchical classification scheme that contains 10 categories was built to access public opinions toward HPV vaccines comprehensively. A 6,000 annotated tweets gold corpus with Kappa annotation agreement at 0.851 was created and made public available. The hierarchical classification model with optimized feature sets and model parameters has increased the micro-averaging and macro-averaging F score from 0.6732 and 0.3967 to 0.7442 and 0.5883 respectively, compared with baseline model. CONCLUSIONS: Our work provides a systematical way to improve the machine learning model performance on the highly unbalanced HPV vaccines related tweets corpus. Our system can be further applied on a large tweets corpus to extract large-scale public opinion towards HPV vaccines. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13326-017-0120-6) contains supplementary material, which is available to authorized users. BioMed Central 2017-03-03 /pmc/articles/PMC5335787/ /pubmed/28253919 http://dx.doi.org/10.1186/s13326-017-0120-6 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Du, Jingcheng
Xu, Jun
Song, Hsingyi
Liu, Xiangyu
Tao, Cui
Optimization on machine learning based approaches for sentiment analysis on HPV vaccines related tweets
title Optimization on machine learning based approaches for sentiment analysis on HPV vaccines related tweets
title_full Optimization on machine learning based approaches for sentiment analysis on HPV vaccines related tweets
title_fullStr Optimization on machine learning based approaches for sentiment analysis on HPV vaccines related tweets
title_full_unstemmed Optimization on machine learning based approaches for sentiment analysis on HPV vaccines related tweets
title_short Optimization on machine learning based approaches for sentiment analysis on HPV vaccines related tweets
title_sort optimization on machine learning based approaches for sentiment analysis on hpv vaccines related tweets
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5335787/
https://www.ncbi.nlm.nih.gov/pubmed/28253919
http://dx.doi.org/10.1186/s13326-017-0120-6
work_keys_str_mv AT dujingcheng optimizationonmachinelearningbasedapproachesforsentimentanalysisonhpvvaccinesrelatedtweets
AT xujun optimizationonmachinelearningbasedapproachesforsentimentanalysisonhpvvaccinesrelatedtweets
AT songhsingyi optimizationonmachinelearningbasedapproachesforsentimentanalysisonhpvvaccinesrelatedtweets
AT liuxiangyu optimizationonmachinelearningbasedapproachesforsentimentanalysisonhpvvaccinesrelatedtweets
AT taocui optimizationonmachinelearningbasedapproachesforsentimentanalysisonhpvvaccinesrelatedtweets