Cargando…

LEIA: Linguistic Embeddings for the Identification of Affect

The wealth of text data generated by social media has enabled new kinds of analysis of emotions with language models. These models are often trained on small and costly datasets of text annotations produced by readers who guess the emotions expressed by others in social media posts. This affects the...

Descripción completa

Detalles Bibliográficos
Autores principales:	Aroyehun, Segun Taofeek, Malik, Lukas, Metzler, Hannah, Haimerl, Nikolas, Di Natale, Anna, Garcia, David
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer Berlin Heidelberg 2023
Materias:	Regular Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10654159/ https://www.ncbi.nlm.nih.gov/pubmed/38020476 http://dx.doi.org/10.1140/epjds/s13688-023-00427-0

_version_	1785147824278601728
author	Aroyehun, Segun Taofeek Malik, Lukas Metzler, Hannah Haimerl, Nikolas Di Natale, Anna Garcia, David
author_facet	Aroyehun, Segun Taofeek Malik, Lukas Metzler, Hannah Haimerl, Nikolas Di Natale, Anna Garcia, David
author_sort	Aroyehun, Segun Taofeek
collection	PubMed
description	The wealth of text data generated by social media has enabled new kinds of analysis of emotions with language models. These models are often trained on small and costly datasets of text annotations produced by readers who guess the emotions expressed by others in social media posts. This affects the quality of emotion identification methods due to training data size limitations and noise in the production of labels used in model development. We present LEIA, a model for emotion identification in text that has been trained on a dataset of more than 6 million posts with self-annotated emotion labels for happiness, affection, sadness, anger, and fear. LEIA is based on a word masking method that enhances the learning of emotion words during model pre-training. LEIA achieves macro-F1 values of approximately 73 on three in-domain test datasets, outperforming other supervised and unsupervised methods in a strong benchmark that shows that LEIA generalizes across posts, users, and time periods. We further perform an out-of-domain evaluation on five different datasets of social media and other sources, showing LEIA’s robust performance across media, data collection methods, and annotation schemes. Our results show that LEIA generalizes its classification of anger, happiness, and sadness beyond the domain it was trained on. LEIA can be applied in future research to provide better identification of emotions in text from the perspective of the writer.
format	Online Article Text
id	pubmed-10654159
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Springer Berlin Heidelberg
record_format	MEDLINE/PubMed
spelling	pubmed-106541592023-11-16 LEIA: Linguistic Embeddings for the Identification of Affect Aroyehun, Segun Taofeek Malik, Lukas Metzler, Hannah Haimerl, Nikolas Di Natale, Anna Garcia, David EPJ Data Sci Regular Article The wealth of text data generated by social media has enabled new kinds of analysis of emotions with language models. These models are often trained on small and costly datasets of text annotations produced by readers who guess the emotions expressed by others in social media posts. This affects the quality of emotion identification methods due to training data size limitations and noise in the production of labels used in model development. We present LEIA, a model for emotion identification in text that has been trained on a dataset of more than 6 million posts with self-annotated emotion labels for happiness, affection, sadness, anger, and fear. LEIA is based on a word masking method that enhances the learning of emotion words during model pre-training. LEIA achieves macro-F1 values of approximately 73 on three in-domain test datasets, outperforming other supervised and unsupervised methods in a strong benchmark that shows that LEIA generalizes across posts, users, and time periods. We further perform an out-of-domain evaluation on five different datasets of social media and other sources, showing LEIA’s robust performance across media, data collection methods, and annotation schemes. Our results show that LEIA generalizes its classification of anger, happiness, and sadness beyond the domain it was trained on. LEIA can be applied in future research to provide better identification of emotions in text from the perspective of the writer. Springer Berlin Heidelberg 2023-11-16 2023 /pmc/articles/PMC10654159/ /pubmed/38020476 http://dx.doi.org/10.1140/epjds/s13688-023-00427-0 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Regular Article Aroyehun, Segun Taofeek Malik, Lukas Metzler, Hannah Haimerl, Nikolas Di Natale, Anna Garcia, David LEIA: Linguistic Embeddings for the Identification of Affect
title	LEIA: Linguistic Embeddings for the Identification of Affect
title_full	LEIA: Linguistic Embeddings for the Identification of Affect
title_fullStr	LEIA: Linguistic Embeddings for the Identification of Affect
title_full_unstemmed	LEIA: Linguistic Embeddings for the Identification of Affect
title_short	LEIA: Linguistic Embeddings for the Identification of Affect
title_sort	leia: linguistic embeddings for the identification of affect
topic	Regular Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10654159/ https://www.ncbi.nlm.nih.gov/pubmed/38020476 http://dx.doi.org/10.1140/epjds/s13688-023-00427-0
work_keys_str_mv	AT aroyehunseguntaofeek leialinguisticembeddingsfortheidentificationofaffect AT maliklukas leialinguisticembeddingsfortheidentificationofaffect AT metzlerhannah leialinguisticembeddingsfortheidentificationofaffect AT haimerlnikolas leialinguisticembeddingsfortheidentificationofaffect AT dinataleanna leialinguisticembeddingsfortheidentificationofaffect AT garciadavid leialinguisticembeddingsfortheidentificationofaffect

LEIA: Linguistic Embeddings for the Identification of Affect

Ejemplares similares