Cargando…

Identifying health related occupations of Twitter users through word embedding and deep neural networks

BACKGROUND: Twitter is a popular social networking site where short messages or “tweets” of users have been used extensively for research purposes. However, not much research has been done in mining the medical professions, such as detecting the occupations of users from their biographical contents....

Descripción completa

Detalles Bibliográficos
Autores principales: Zainab, Kazi, Srivastava, Gautam, Mago, Vijay
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9520792/
https://www.ncbi.nlm.nih.gov/pubmed/36171569
http://dx.doi.org/10.1186/s12859-022-04933-2
_version_ 1784799705295749120
author Zainab, Kazi
Srivastava, Gautam
Mago, Vijay
author_facet Zainab, Kazi
Srivastava, Gautam
Mago, Vijay
author_sort Zainab, Kazi
collection PubMed
description BACKGROUND: Twitter is a popular social networking site where short messages or “tweets” of users have been used extensively for research purposes. However, not much research has been done in mining the medical professions, such as detecting the occupations of users from their biographical contents. Mining such professions can be used to build efficient recommender systems for cost-effective targeted advertisements. Moreover, it is highly important to develop effective methods to identify the occupation of users since conventional classification methods rely on features developed by human intelligence. Although, the result may be favorable for the classification problem. However, it is still extremely challenging for traditional classifiers to predict the medical occupations accurately since it involves predicting multiple occupations. Hence this study emphasizes predicting the medical occupational class of users through their public biographical (“Bio”) content. We have conducted our analysis by annotating the bio content of Twitter users. In this paper, we propose a method of combining word embedding with state-of-art neural network models that include: Long Short Term Memory (LSTM), Bidirectional LSTM, Gated Recurrent Unit, Bidirectional Encoder Representations from Transformers, and A lite BERT. Moreover, we have also observed that by composing the word embedding with the neural network models there is no need to construct any particular attribute or feature. By using word embedding, the bio contents are formatted as dense vectors which are fed as input into the neural network models as a sequence of vectors. RESULT: Performance metrics that include accuracy, precision, recall, and F1-score have shown a significant difference between our method of combining word embedding with neural network models than with the traditional methods. The scores have proved that our proposed approach has outperformed the traditional machine learning techniques for detecting medical occupations among users. ALBERT has performed the best among the deep learning networks with an F1 score of 0.90. CONCLUSION: In this study, we have presented a novel method of detecting the occupations of Twitter users engaged in the medical domain by merging word embedding with state-of-art neural networks. The outcomes of our approach have demonstrated that our method can further advance the process of analyzing corpora of social media without going through the trouble of developing computationally expensive features.
format Online
Article
Text
id pubmed-9520792
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-95207922022-09-30 Identifying health related occupations of Twitter users through word embedding and deep neural networks Zainab, Kazi Srivastava, Gautam Mago, Vijay BMC Bioinformatics Research BACKGROUND: Twitter is a popular social networking site where short messages or “tweets” of users have been used extensively for research purposes. However, not much research has been done in mining the medical professions, such as detecting the occupations of users from their biographical contents. Mining such professions can be used to build efficient recommender systems for cost-effective targeted advertisements. Moreover, it is highly important to develop effective methods to identify the occupation of users since conventional classification methods rely on features developed by human intelligence. Although, the result may be favorable for the classification problem. However, it is still extremely challenging for traditional classifiers to predict the medical occupations accurately since it involves predicting multiple occupations. Hence this study emphasizes predicting the medical occupational class of users through their public biographical (“Bio”) content. We have conducted our analysis by annotating the bio content of Twitter users. In this paper, we propose a method of combining word embedding with state-of-art neural network models that include: Long Short Term Memory (LSTM), Bidirectional LSTM, Gated Recurrent Unit, Bidirectional Encoder Representations from Transformers, and A lite BERT. Moreover, we have also observed that by composing the word embedding with the neural network models there is no need to construct any particular attribute or feature. By using word embedding, the bio contents are formatted as dense vectors which are fed as input into the neural network models as a sequence of vectors. RESULT: Performance metrics that include accuracy, precision, recall, and F1-score have shown a significant difference between our method of combining word embedding with neural network models than with the traditional methods. The scores have proved that our proposed approach has outperformed the traditional machine learning techniques for detecting medical occupations among users. ALBERT has performed the best among the deep learning networks with an F1 score of 0.90. CONCLUSION: In this study, we have presented a novel method of detecting the occupations of Twitter users engaged in the medical domain by merging word embedding with state-of-art neural networks. The outcomes of our approach have demonstrated that our method can further advance the process of analyzing corpora of social media without going through the trouble of developing computationally expensive features. BioMed Central 2022-09-28 /pmc/articles/PMC9520792/ /pubmed/36171569 http://dx.doi.org/10.1186/s12859-022-04933-2 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Zainab, Kazi
Srivastava, Gautam
Mago, Vijay
Identifying health related occupations of Twitter users through word embedding and deep neural networks
title Identifying health related occupations of Twitter users through word embedding and deep neural networks
title_full Identifying health related occupations of Twitter users through word embedding and deep neural networks
title_fullStr Identifying health related occupations of Twitter users through word embedding and deep neural networks
title_full_unstemmed Identifying health related occupations of Twitter users through word embedding and deep neural networks
title_short Identifying health related occupations of Twitter users through word embedding and deep neural networks
title_sort identifying health related occupations of twitter users through word embedding and deep neural networks
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9520792/
https://www.ncbi.nlm.nih.gov/pubmed/36171569
http://dx.doi.org/10.1186/s12859-022-04933-2
work_keys_str_mv AT zainabkazi identifyinghealthrelatedoccupationsoftwitterusersthroughwordembeddinganddeepneuralnetworks
AT srivastavagautam identifyinghealthrelatedoccupationsoftwitterusersthroughwordembeddinganddeepneuralnetworks
AT magovijay identifyinghealthrelatedoccupationsoftwitterusersthroughwordembeddinganddeepneuralnetworks