Cargando…

Specialists, Scientists, and Sentiments: Word2Vec and Doc2Vec in Analysis of Scientific and Medical Texts

Analyze performance of unsupervised embedding algorithms in sentiment analysis of knowledge-rich data sets. We apply state-of-the-art embedding algorithms Word2Vec and Doc2Vec as the learning techniques. The algorithms build word and document embeddings in an unsupervised manner. To assess the algor...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Qufei, Sokolova, Marina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Singapore 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8364743/
https://www.ncbi.nlm.nih.gov/pubmed/34414378
http://dx.doi.org/10.1007/s42979-021-00807-1
_version_ 1783738577452007424
author Chen, Qufei
Sokolova, Marina
author_facet Chen, Qufei
Sokolova, Marina
author_sort Chen, Qufei
collection PubMed
description Analyze performance of unsupervised embedding algorithms in sentiment analysis of knowledge-rich data sets. We apply state-of-the-art embedding algorithms Word2Vec and Doc2Vec as the learning techniques. The algorithms build word and document embeddings in an unsupervised manner. To assess the algorithms’ performance, we define sentiment metrics and use a semantic lexicon SentiWordNet (SWN) to establish the benchmark measures. Our empirical results are obtained on the Obesity data set from i2b2 clinical discharge summaries and the Reuters Science dataset. We use the Welch’s test to analyze the obtained sentiment evaluation. On the Obesity data, the Welch’s test found significant difference between the SWN evaluation of the most positive and most negative texts. On the same data, the Word2Vec results support the SWN results, whereas the Doc2Vec results partially correspond to the Word2Vec and the SWN results. On the Reuters data, the Welch’s test did not find significant difference between the SWN evaluation of the most positive and most negative texts. On the same data, Word2Vec and Doc2Vec results only in part correspond to the SWN results. In unsupervised sentiment analysis of medical and scientific texts, the Word2Vec sentiment analysis has been more consistent with the SentiWordNet sentiment assessment than the Doc2Vec sentiment analysis. The Welch’s test of the SentiWordNet results has been a strong indicator of future correspondence between Word2Vec and SentiWordNet results.
format Online
Article
Text
id pubmed-8364743
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Springer Singapore
record_format MEDLINE/PubMed
spelling pubmed-83647432021-08-15 Specialists, Scientists, and Sentiments: Word2Vec and Doc2Vec in Analysis of Scientific and Medical Texts Chen, Qufei Sokolova, Marina SN Comput Sci Original Research Analyze performance of unsupervised embedding algorithms in sentiment analysis of knowledge-rich data sets. We apply state-of-the-art embedding algorithms Word2Vec and Doc2Vec as the learning techniques. The algorithms build word and document embeddings in an unsupervised manner. To assess the algorithms’ performance, we define sentiment metrics and use a semantic lexicon SentiWordNet (SWN) to establish the benchmark measures. Our empirical results are obtained on the Obesity data set from i2b2 clinical discharge summaries and the Reuters Science dataset. We use the Welch’s test to analyze the obtained sentiment evaluation. On the Obesity data, the Welch’s test found significant difference between the SWN evaluation of the most positive and most negative texts. On the same data, the Word2Vec results support the SWN results, whereas the Doc2Vec results partially correspond to the Word2Vec and the SWN results. On the Reuters data, the Welch’s test did not find significant difference between the SWN evaluation of the most positive and most negative texts. On the same data, Word2Vec and Doc2Vec results only in part correspond to the SWN results. In unsupervised sentiment analysis of medical and scientific texts, the Word2Vec sentiment analysis has been more consistent with the SentiWordNet sentiment assessment than the Doc2Vec sentiment analysis. The Welch’s test of the SentiWordNet results has been a strong indicator of future correspondence between Word2Vec and SentiWordNet results. Springer Singapore 2021-08-15 2021 /pmc/articles/PMC8364743/ /pubmed/34414378 http://dx.doi.org/10.1007/s42979-021-00807-1 Text en © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2021 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Original Research
Chen, Qufei
Sokolova, Marina
Specialists, Scientists, and Sentiments: Word2Vec and Doc2Vec in Analysis of Scientific and Medical Texts
title Specialists, Scientists, and Sentiments: Word2Vec and Doc2Vec in Analysis of Scientific and Medical Texts
title_full Specialists, Scientists, and Sentiments: Word2Vec and Doc2Vec in Analysis of Scientific and Medical Texts
title_fullStr Specialists, Scientists, and Sentiments: Word2Vec and Doc2Vec in Analysis of Scientific and Medical Texts
title_full_unstemmed Specialists, Scientists, and Sentiments: Word2Vec and Doc2Vec in Analysis of Scientific and Medical Texts
title_short Specialists, Scientists, and Sentiments: Word2Vec and Doc2Vec in Analysis of Scientific and Medical Texts
title_sort specialists, scientists, and sentiments: word2vec and doc2vec in analysis of scientific and medical texts
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8364743/
https://www.ncbi.nlm.nih.gov/pubmed/34414378
http://dx.doi.org/10.1007/s42979-021-00807-1
work_keys_str_mv AT chenqufei specialistsscientistsandsentimentsword2vecanddoc2vecinanalysisofscientificandmedicaltexts
AT sokolovamarina specialistsscientistsandsentimentsword2vecanddoc2vecinanalysisofscientificandmedicaltexts