Cargando…

Examining Sentiment in Complex Texts. A Comparison of Different Computational Approaches

Can we rely on computational methods to accurately analyze complex texts? To answer this question, we compared different dictionary and scaling methods used in predicting the sentiment of German literature reviews to the “gold standard” of human-coded sentiments. Literature reviews constitute a chal...

Descripción completa

Detalles Bibliográficos
Autores principales:	Munnes, Stefan, Harsch, Corinna, Knobloch, Marcel, Vogel, Johannes S., Hipp, Lena, Schilling, Erik
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2022
Materias:	Big Data
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9114298/ https://www.ncbi.nlm.nih.gov/pubmed/35600329 http://dx.doi.org/10.3389/fdata.2022.886362

_version_	1784709740146720768
author	Munnes, Stefan Harsch, Corinna Knobloch, Marcel Vogel, Johannes S. Hipp, Lena Schilling, Erik
author_facet	Munnes, Stefan Harsch, Corinna Knobloch, Marcel Vogel, Johannes S. Hipp, Lena Schilling, Erik
author_sort	Munnes, Stefan
collection	PubMed
description	Can we rely on computational methods to accurately analyze complex texts? To answer this question, we compared different dictionary and scaling methods used in predicting the sentiment of German literature reviews to the “gold standard” of human-coded sentiments. Literature reviews constitute a challenging text corpus for computational analysis as they not only contain different text levels—for example, a summary of the work and the reviewer's appraisal—but are also characterized by subtle and ambiguous language elements. To take the nuanced sentiments of literature reviews into account, we worked with a metric rather than a dichotomous scale for sentiment analysis. The results of our analyses show that the predicted sentiments of prefabricated dictionaries, which are computationally efficient and require minimal adaption, have a low to medium correlation with the human-coded sentiments (r between 0.32 and 0.39). The accuracy of self-created dictionaries using word embeddings (both pre-trained and self-trained) was considerably lower (r between 0.10 and 0.28). Given the high coding intensity and contingency on seed selection as well as the degree of data pre-processing of word embeddings that we found with our data, we would not recommend them for complex texts without further adaptation. While fully automated approaches appear not to work in accurately predicting text sentiments with complex texts such as ours, we found relatively high correlations with a semiautomated approach (r of around 0.6)—which, however, requires intensive human coding efforts for the training dataset. In addition to illustrating the benefits and limits of computational approaches in analyzing complex text corpora and the potential of metric rather than binary scales of text sentiment, we also provide a practical guide for researchers to select an appropriate method and degree of pre-processing when working with complex texts.
format	Online Article Text
id	pubmed-9114298
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-91142982022-05-19 Examining Sentiment in Complex Texts. A Comparison of Different Computational Approaches Munnes, Stefan Harsch, Corinna Knobloch, Marcel Vogel, Johannes S. Hipp, Lena Schilling, Erik Front Big Data Big Data Can we rely on computational methods to accurately analyze complex texts? To answer this question, we compared different dictionary and scaling methods used in predicting the sentiment of German literature reviews to the “gold standard” of human-coded sentiments. Literature reviews constitute a challenging text corpus for computational analysis as they not only contain different text levels—for example, a summary of the work and the reviewer's appraisal—but are also characterized by subtle and ambiguous language elements. To take the nuanced sentiments of literature reviews into account, we worked with a metric rather than a dichotomous scale for sentiment analysis. The results of our analyses show that the predicted sentiments of prefabricated dictionaries, which are computationally efficient and require minimal adaption, have a low to medium correlation with the human-coded sentiments (r between 0.32 and 0.39). The accuracy of self-created dictionaries using word embeddings (both pre-trained and self-trained) was considerably lower (r between 0.10 and 0.28). Given the high coding intensity and contingency on seed selection as well as the degree of data pre-processing of word embeddings that we found with our data, we would not recommend them for complex texts without further adaptation. While fully automated approaches appear not to work in accurately predicting text sentiments with complex texts such as ours, we found relatively high correlations with a semiautomated approach (r of around 0.6)—which, however, requires intensive human coding efforts for the training dataset. In addition to illustrating the benefits and limits of computational approaches in analyzing complex text corpora and the potential of metric rather than binary scales of text sentiment, we also provide a practical guide for researchers to select an appropriate method and degree of pre-processing when working with complex texts. Frontiers Media S.A. 2022-05-04 /pmc/articles/PMC9114298/ /pubmed/35600329 http://dx.doi.org/10.3389/fdata.2022.886362 Text en Copyright © 2022 Munnes, Harsch, Knobloch, Vogel, Hipp and Schilling. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Big Data Munnes, Stefan Harsch, Corinna Knobloch, Marcel Vogel, Johannes S. Hipp, Lena Schilling, Erik Examining Sentiment in Complex Texts. A Comparison of Different Computational Approaches
title	Examining Sentiment in Complex Texts. A Comparison of Different Computational Approaches
title_full	Examining Sentiment in Complex Texts. A Comparison of Different Computational Approaches
title_fullStr	Examining Sentiment in Complex Texts. A Comparison of Different Computational Approaches
title_full_unstemmed	Examining Sentiment in Complex Texts. A Comparison of Different Computational Approaches
title_short	Examining Sentiment in Complex Texts. A Comparison of Different Computational Approaches
title_sort	examining sentiment in complex texts. a comparison of different computational approaches
topic	Big Data
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9114298/ https://www.ncbi.nlm.nih.gov/pubmed/35600329 http://dx.doi.org/10.3389/fdata.2022.886362
work_keys_str_mv	AT munnesstefan examiningsentimentincomplextextsacomparisonofdifferentcomputationalapproaches AT harschcorinna examiningsentimentincomplextextsacomparisonofdifferentcomputationalapproaches AT knoblochmarcel examiningsentimentincomplextextsacomparisonofdifferentcomputationalapproaches AT vogeljohanness examiningsentimentincomplextextsacomparisonofdifferentcomputationalapproaches AT hipplena examiningsentimentincomplextextsacomparisonofdifferentcomputationalapproaches AT schillingerik examiningsentimentincomplextextsacomparisonofdifferentcomputationalapproaches

Examining Sentiment in Complex Texts. A Comparison of Different Computational Approaches

Ejemplares similares