Cargando…
Specialists, Scientists, and Sentiments: Word2Vec and Doc2Vec in Analysis of Scientific and Medical Texts
Analyze performance of unsupervised embedding algorithms in sentiment analysis of knowledge-rich data sets. We apply state-of-the-art embedding algorithms Word2Vec and Doc2Vec as the learning techniques. The algorithms build word and document embeddings in an unsupervised manner. To assess the algor...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer Singapore
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8364743/ https://www.ncbi.nlm.nih.gov/pubmed/34414378 http://dx.doi.org/10.1007/s42979-021-00807-1 |
_version_ | 1783738577452007424 |
---|---|
author | Chen, Qufei Sokolova, Marina |
author_facet | Chen, Qufei Sokolova, Marina |
author_sort | Chen, Qufei |
collection | PubMed |
description | Analyze performance of unsupervised embedding algorithms in sentiment analysis of knowledge-rich data sets. We apply state-of-the-art embedding algorithms Word2Vec and Doc2Vec as the learning techniques. The algorithms build word and document embeddings in an unsupervised manner. To assess the algorithms’ performance, we define sentiment metrics and use a semantic lexicon SentiWordNet (SWN) to establish the benchmark measures. Our empirical results are obtained on the Obesity data set from i2b2 clinical discharge summaries and the Reuters Science dataset. We use the Welch’s test to analyze the obtained sentiment evaluation. On the Obesity data, the Welch’s test found significant difference between the SWN evaluation of the most positive and most negative texts. On the same data, the Word2Vec results support the SWN results, whereas the Doc2Vec results partially correspond to the Word2Vec and the SWN results. On the Reuters data, the Welch’s test did not find significant difference between the SWN evaluation of the most positive and most negative texts. On the same data, Word2Vec and Doc2Vec results only in part correspond to the SWN results. In unsupervised sentiment analysis of medical and scientific texts, the Word2Vec sentiment analysis has been more consistent with the SentiWordNet sentiment assessment than the Doc2Vec sentiment analysis. The Welch’s test of the SentiWordNet results has been a strong indicator of future correspondence between Word2Vec and SentiWordNet results. |
format | Online Article Text |
id | pubmed-8364743 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Springer Singapore |
record_format | MEDLINE/PubMed |
spelling | pubmed-83647432021-08-15 Specialists, Scientists, and Sentiments: Word2Vec and Doc2Vec in Analysis of Scientific and Medical Texts Chen, Qufei Sokolova, Marina SN Comput Sci Original Research Analyze performance of unsupervised embedding algorithms in sentiment analysis of knowledge-rich data sets. We apply state-of-the-art embedding algorithms Word2Vec and Doc2Vec as the learning techniques. The algorithms build word and document embeddings in an unsupervised manner. To assess the algorithms’ performance, we define sentiment metrics and use a semantic lexicon SentiWordNet (SWN) to establish the benchmark measures. Our empirical results are obtained on the Obesity data set from i2b2 clinical discharge summaries and the Reuters Science dataset. We use the Welch’s test to analyze the obtained sentiment evaluation. On the Obesity data, the Welch’s test found significant difference between the SWN evaluation of the most positive and most negative texts. On the same data, the Word2Vec results support the SWN results, whereas the Doc2Vec results partially correspond to the Word2Vec and the SWN results. On the Reuters data, the Welch’s test did not find significant difference between the SWN evaluation of the most positive and most negative texts. On the same data, Word2Vec and Doc2Vec results only in part correspond to the SWN results. In unsupervised sentiment analysis of medical and scientific texts, the Word2Vec sentiment analysis has been more consistent with the SentiWordNet sentiment assessment than the Doc2Vec sentiment analysis. The Welch’s test of the SentiWordNet results has been a strong indicator of future correspondence between Word2Vec and SentiWordNet results. Springer Singapore 2021-08-15 2021 /pmc/articles/PMC8364743/ /pubmed/34414378 http://dx.doi.org/10.1007/s42979-021-00807-1 Text en © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2021 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Original Research Chen, Qufei Sokolova, Marina Specialists, Scientists, and Sentiments: Word2Vec and Doc2Vec in Analysis of Scientific and Medical Texts |
title | Specialists, Scientists, and Sentiments: Word2Vec and Doc2Vec in Analysis of Scientific and Medical Texts |
title_full | Specialists, Scientists, and Sentiments: Word2Vec and Doc2Vec in Analysis of Scientific and Medical Texts |
title_fullStr | Specialists, Scientists, and Sentiments: Word2Vec and Doc2Vec in Analysis of Scientific and Medical Texts |
title_full_unstemmed | Specialists, Scientists, and Sentiments: Word2Vec and Doc2Vec in Analysis of Scientific and Medical Texts |
title_short | Specialists, Scientists, and Sentiments: Word2Vec and Doc2Vec in Analysis of Scientific and Medical Texts |
title_sort | specialists, scientists, and sentiments: word2vec and doc2vec in analysis of scientific and medical texts |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8364743/ https://www.ncbi.nlm.nih.gov/pubmed/34414378 http://dx.doi.org/10.1007/s42979-021-00807-1 |
work_keys_str_mv | AT chenqufei specialistsscientistsandsentimentsword2vecanddoc2vecinanalysisofscientificandmedicaltexts AT sokolovamarina specialistsscientistsandsentimentsword2vecanddoc2vecinanalysisofscientificandmedicaltexts |