Cargando…

Combining Character and Word Embeddings for Affect in Arabic Informal Social Media Microblogs

Word representation models have been successfully applied in many natural language processing tasks, including sentiment analysis. However, these models do not always work effectively in some social media contexts. When considering the use of Arabic in microblogs like Twitter, it is important to not...

Descripción completa

Detalles Bibliográficos
Autores principales:	Alharbi, Abdullah I., Lee, Mark
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7298193/ http://dx.doi.org/10.1007/978-3-030-51310-8_20

_version_	1783547166865752064
author	Alharbi, Abdullah I. Lee, Mark
author_facet	Alharbi, Abdullah I. Lee, Mark
author_sort	Alharbi, Abdullah I.
collection	PubMed
description	Word representation models have been successfully applied in many natural language processing tasks, including sentiment analysis. However, these models do not always work effectively in some social media contexts. When considering the use of Arabic in microblogs like Twitter, it is important to note that a variety of different linguistic domains are involved. This is mainly because social media users employ various dialects in their communications. While training word-level models with such informal text can lead to words being captured that have the same meanings, these models cannot capture all words that can be encountered in the real world due to out-of-vocabulary (OOV) words. The inability to identify words is one of the main limitations of this word-level model. In contrast, character-level embeddings can work effectively with this problem through their ability to learn the vectors of character n-grams or parts of words. We take advantage of both character- and word-level models to discover more effective methods to represent Arabic affect words in tweets. We evaluate our embeddings by incorporating them into a supervised learning framework for a range of affect tasks. Our models outperform the state-of-the-art Arabic pre-trained word embeddings in these tasks. Moreover, they offer improved state-of-the-art results for the task of Arabic emotion intensity, outperforming the top-performing systems that employ a combination of deep neural networks and several other features.
format	Online Article Text
id	pubmed-7298193
institution	National Center for Biotechnology Information
language	English
publishDate	2020
record_format	MEDLINE/PubMed
spelling	pubmed-72981932020-06-17 Combining Character and Word Embeddings for Affect in Arabic Informal Social Media Microblogs Alharbi, Abdullah I. Lee, Mark Natural Language Processing and Information Systems Article Word representation models have been successfully applied in many natural language processing tasks, including sentiment analysis. However, these models do not always work effectively in some social media contexts. When considering the use of Arabic in microblogs like Twitter, it is important to note that a variety of different linguistic domains are involved. This is mainly because social media users employ various dialects in their communications. While training word-level models with such informal text can lead to words being captured that have the same meanings, these models cannot capture all words that can be encountered in the real world due to out-of-vocabulary (OOV) words. The inability to identify words is one of the main limitations of this word-level model. In contrast, character-level embeddings can work effectively with this problem through their ability to learn the vectors of character n-grams or parts of words. We take advantage of both character- and word-level models to discover more effective methods to represent Arabic affect words in tweets. We evaluate our embeddings by incorporating them into a supervised learning framework for a range of affect tasks. Our models outperform the state-of-the-art Arabic pre-trained word embeddings in these tasks. Moreover, they offer improved state-of-the-art results for the task of Arabic emotion intensity, outperforming the top-performing systems that employ a combination of deep neural networks and several other features. 2020-05-26 /pmc/articles/PMC7298193/ http://dx.doi.org/10.1007/978-3-030-51310-8_20 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Article Alharbi, Abdullah I. Lee, Mark Combining Character and Word Embeddings for Affect in Arabic Informal Social Media Microblogs
title	Combining Character and Word Embeddings for Affect in Arabic Informal Social Media Microblogs
title_full	Combining Character and Word Embeddings for Affect in Arabic Informal Social Media Microblogs
title_fullStr	Combining Character and Word Embeddings for Affect in Arabic Informal Social Media Microblogs
title_full_unstemmed	Combining Character and Word Embeddings for Affect in Arabic Informal Social Media Microblogs
title_short	Combining Character and Word Embeddings for Affect in Arabic Informal Social Media Microblogs
title_sort	combining character and word embeddings for affect in arabic informal social media microblogs
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7298193/ http://dx.doi.org/10.1007/978-3-030-51310-8_20
work_keys_str_mv	AT alharbiabdullahi combiningcharacterandwordembeddingsforaffectinarabicinformalsocialmediamicroblogs AT leemark combiningcharacterandwordembeddingsforaffectinarabicinformalsocialmediamicroblogs

Combining Character and Word Embeddings for Affect in Arabic Informal Social Media Microblogs

Ejemplares similares