Cargando…
Identifying health related occupations of Twitter users through word embedding and deep neural networks
BACKGROUND: Twitter is a popular social networking site where short messages or “tweets” of users have been used extensively for research purposes. However, not much research has been done in mining the medical professions, such as detecting the occupations of users from their biographical contents....
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9520792/ https://www.ncbi.nlm.nih.gov/pubmed/36171569 http://dx.doi.org/10.1186/s12859-022-04933-2 |
_version_ | 1784799705295749120 |
---|---|
author | Zainab, Kazi Srivastava, Gautam Mago, Vijay |
author_facet | Zainab, Kazi Srivastava, Gautam Mago, Vijay |
author_sort | Zainab, Kazi |
collection | PubMed |
description | BACKGROUND: Twitter is a popular social networking site where short messages or “tweets” of users have been used extensively for research purposes. However, not much research has been done in mining the medical professions, such as detecting the occupations of users from their biographical contents. Mining such professions can be used to build efficient recommender systems for cost-effective targeted advertisements. Moreover, it is highly important to develop effective methods to identify the occupation of users since conventional classification methods rely on features developed by human intelligence. Although, the result may be favorable for the classification problem. However, it is still extremely challenging for traditional classifiers to predict the medical occupations accurately since it involves predicting multiple occupations. Hence this study emphasizes predicting the medical occupational class of users through their public biographical (“Bio”) content. We have conducted our analysis by annotating the bio content of Twitter users. In this paper, we propose a method of combining word embedding with state-of-art neural network models that include: Long Short Term Memory (LSTM), Bidirectional LSTM, Gated Recurrent Unit, Bidirectional Encoder Representations from Transformers, and A lite BERT. Moreover, we have also observed that by composing the word embedding with the neural network models there is no need to construct any particular attribute or feature. By using word embedding, the bio contents are formatted as dense vectors which are fed as input into the neural network models as a sequence of vectors. RESULT: Performance metrics that include accuracy, precision, recall, and F1-score have shown a significant difference between our method of combining word embedding with neural network models than with the traditional methods. The scores have proved that our proposed approach has outperformed the traditional machine learning techniques for detecting medical occupations among users. ALBERT has performed the best among the deep learning networks with an F1 score of 0.90. CONCLUSION: In this study, we have presented a novel method of detecting the occupations of Twitter users engaged in the medical domain by merging word embedding with state-of-art neural networks. The outcomes of our approach have demonstrated that our method can further advance the process of analyzing corpora of social media without going through the trouble of developing computationally expensive features. |
format | Online Article Text |
id | pubmed-9520792 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-95207922022-09-30 Identifying health related occupations of Twitter users through word embedding and deep neural networks Zainab, Kazi Srivastava, Gautam Mago, Vijay BMC Bioinformatics Research BACKGROUND: Twitter is a popular social networking site where short messages or “tweets” of users have been used extensively for research purposes. However, not much research has been done in mining the medical professions, such as detecting the occupations of users from their biographical contents. Mining such professions can be used to build efficient recommender systems for cost-effective targeted advertisements. Moreover, it is highly important to develop effective methods to identify the occupation of users since conventional classification methods rely on features developed by human intelligence. Although, the result may be favorable for the classification problem. However, it is still extremely challenging for traditional classifiers to predict the medical occupations accurately since it involves predicting multiple occupations. Hence this study emphasizes predicting the medical occupational class of users through their public biographical (“Bio”) content. We have conducted our analysis by annotating the bio content of Twitter users. In this paper, we propose a method of combining word embedding with state-of-art neural network models that include: Long Short Term Memory (LSTM), Bidirectional LSTM, Gated Recurrent Unit, Bidirectional Encoder Representations from Transformers, and A lite BERT. Moreover, we have also observed that by composing the word embedding with the neural network models there is no need to construct any particular attribute or feature. By using word embedding, the bio contents are formatted as dense vectors which are fed as input into the neural network models as a sequence of vectors. RESULT: Performance metrics that include accuracy, precision, recall, and F1-score have shown a significant difference between our method of combining word embedding with neural network models than with the traditional methods. The scores have proved that our proposed approach has outperformed the traditional machine learning techniques for detecting medical occupations among users. ALBERT has performed the best among the deep learning networks with an F1 score of 0.90. CONCLUSION: In this study, we have presented a novel method of detecting the occupations of Twitter users engaged in the medical domain by merging word embedding with state-of-art neural networks. The outcomes of our approach have demonstrated that our method can further advance the process of analyzing corpora of social media without going through the trouble of developing computationally expensive features. BioMed Central 2022-09-28 /pmc/articles/PMC9520792/ /pubmed/36171569 http://dx.doi.org/10.1186/s12859-022-04933-2 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Zainab, Kazi Srivastava, Gautam Mago, Vijay Identifying health related occupations of Twitter users through word embedding and deep neural networks |
title | Identifying health related occupations of Twitter users through word embedding and deep neural networks |
title_full | Identifying health related occupations of Twitter users through word embedding and deep neural networks |
title_fullStr | Identifying health related occupations of Twitter users through word embedding and deep neural networks |
title_full_unstemmed | Identifying health related occupations of Twitter users through word embedding and deep neural networks |
title_short | Identifying health related occupations of Twitter users through word embedding and deep neural networks |
title_sort | identifying health related occupations of twitter users through word embedding and deep neural networks |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9520792/ https://www.ncbi.nlm.nih.gov/pubmed/36171569 http://dx.doi.org/10.1186/s12859-022-04933-2 |
work_keys_str_mv | AT zainabkazi identifyinghealthrelatedoccupationsoftwitterusersthroughwordembeddinganddeepneuralnetworks AT srivastavagautam identifyinghealthrelatedoccupationsoftwitterusersthroughwordembeddinganddeepneuralnetworks AT magovijay identifyinghealthrelatedoccupationsoftwitterusersthroughwordembeddinganddeepneuralnetworks |