Cargando…

Two is better than one: Using a single emotion lexicon can lead to unreliable conclusions

Emotion lexicons became a popular method for quantifying affect in large amounts of textual data (e.g., social media posts). There are multiple independently developed emotion lexicons which tend to correlate positively with one another but not entirely. Such differences between lexicons may not mat...

Descripción completa

Detalles Bibliográficos
Autores principales:	Czarnek, Gabriela, Stillwell, David
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9565755/ https://www.ncbi.nlm.nih.gov/pubmed/36240202 http://dx.doi.org/10.1371/journal.pone.0275910

_version_	1784808968327004160
author	Czarnek, Gabriela Stillwell, David
author_facet	Czarnek, Gabriela Stillwell, David
author_sort	Czarnek, Gabriela
collection	PubMed
description	Emotion lexicons became a popular method for quantifying affect in large amounts of textual data (e.g., social media posts). There are multiple independently developed emotion lexicons which tend to correlate positively with one another but not entirely. Such differences between lexicons may not matter if they are just unsystematic noise, but if there are systematic differences this could affect conclusions of a study. The goal of this paper is to examine whether two extensively used, apparently domain-independent lexicons for emotion analysis would give the same answer to a theory-driven research question. Specifically, we use the Linguistic Inquiry and Word Count (LIWC) and NRC Word-Emotion Association Lexicon (NRC). As an example, we investigate whether older people have more positive expression through their language use. We examined nearly 5 million tweets created by 3,573 people between 18 to 78 years old and found that both methods show an increase in positive affect until age 50. After that age, however, according to LIWC, positive affect drops sharply, whereas according to NRC, the growth of positive affect increases steadily until age 65 and then levels off. Thus, using one or the other method would lead researchers to drastically different theoretical conclusions regarding affect in older age. We unpack why the two methods give inconsistent conclusions and show this was mostly due to a particular class of words: those related to politics. We conclude that using a single lexicon might lead to unreliable conclusions, so we suggest that researchers should routinely use at least two lexicons. If both lexicons come to the same conclusion then the research evidence is reliable, but if not then researchers should further examine the lexicons to find out what difference might be causing inconclusive result.
format	Online Article Text
id	pubmed-9565755
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-95657552022-10-15 Two is better than one: Using a single emotion lexicon can lead to unreliable conclusions Czarnek, Gabriela Stillwell, David PLoS One Research Article Emotion lexicons became a popular method for quantifying affect in large amounts of textual data (e.g., social media posts). There are multiple independently developed emotion lexicons which tend to correlate positively with one another but not entirely. Such differences between lexicons may not matter if they are just unsystematic noise, but if there are systematic differences this could affect conclusions of a study. The goal of this paper is to examine whether two extensively used, apparently domain-independent lexicons for emotion analysis would give the same answer to a theory-driven research question. Specifically, we use the Linguistic Inquiry and Word Count (LIWC) and NRC Word-Emotion Association Lexicon (NRC). As an example, we investigate whether older people have more positive expression through their language use. We examined nearly 5 million tweets created by 3,573 people between 18 to 78 years old and found that both methods show an increase in positive affect until age 50. After that age, however, according to LIWC, positive affect drops sharply, whereas according to NRC, the growth of positive affect increases steadily until age 65 and then levels off. Thus, using one or the other method would lead researchers to drastically different theoretical conclusions regarding affect in older age. We unpack why the two methods give inconsistent conclusions and show this was mostly due to a particular class of words: those related to politics. We conclude that using a single lexicon might lead to unreliable conclusions, so we suggest that researchers should routinely use at least two lexicons. If both lexicons come to the same conclusion then the research evidence is reliable, but if not then researchers should further examine the lexicons to find out what difference might be causing inconclusive result. Public Library of Science 2022-10-14 /pmc/articles/PMC9565755/ /pubmed/36240202 http://dx.doi.org/10.1371/journal.pone.0275910 Text en © 2022 Czarnek, Stillwell https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Czarnek, Gabriela Stillwell, David Two is better than one: Using a single emotion lexicon can lead to unreliable conclusions
title	Two is better than one: Using a single emotion lexicon can lead to unreliable conclusions
title_full	Two is better than one: Using a single emotion lexicon can lead to unreliable conclusions
title_fullStr	Two is better than one: Using a single emotion lexicon can lead to unreliable conclusions
title_full_unstemmed	Two is better than one: Using a single emotion lexicon can lead to unreliable conclusions
title_short	Two is better than one: Using a single emotion lexicon can lead to unreliable conclusions
title_sort	two is better than one: using a single emotion lexicon can lead to unreliable conclusions
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9565755/ https://www.ncbi.nlm.nih.gov/pubmed/36240202 http://dx.doi.org/10.1371/journal.pone.0275910
work_keys_str_mv	AT czarnekgabriela twoisbetterthanoneusingasingleemotionlexiconcanleadtounreliableconclusions AT stillwelldavid twoisbetterthanoneusingasingleemotionlexiconcanleadtounreliableconclusions

Two is better than one: Using a single emotion lexicon can lead to unreliable conclusions

Ejemplares similares