Cargando…

Reddit financial image post sentiment dataset

The dataset presented in this paper consists of sentiment information extracted from image and text data of financial subreddit posts. Members of these subreddits post about their trading behavior, express their opinions, and discuss capital market trends. Their posts contain sentiment information o...

Descripción completa

Detalles Bibliográficos
Autores principales:	Fottner, Alexander, Okhrin, Yarema, Pfahler, Jonathan, Wustl, Julian
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2022
Materias:	Data Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9747619/ https://www.ncbi.nlm.nih.gov/pubmed/36533290 http://dx.doi.org/10.1016/j.dib.2022.108759

_version_	1784849642719019008
author	Fottner, Alexander Okhrin, Yarema Pfahler, Jonathan Wustl, Julian
author_facet	Fottner, Alexander Okhrin, Yarema Pfahler, Jonathan Wustl, Julian
author_sort	Fottner, Alexander
collection	PubMed
description	The dataset presented in this paper consists of sentiment information extracted from image and text data of financial subreddit posts. Members of these subreddits post about their trading behavior, express their opinions, and discuss capital market trends. Their posts contain sentiment information on financial topics as well as signaling information on trading decisions. Frequently, members post screenshots of their portfolios from their mobile broker apps. We collected the posts, processed them to extract sentiment scores using various methods, and anonymized them. The dataset consists therefore not of any content from the posts or information about the author, but the processed sentiment information within the post. Further financial tickers mentioned in the posts are tracked, such that the effect of sentiment in the posts can be attributed to financial products and used in the context of financial forecasting. The posts were collected using the Reddit [2] and Pushshift APIs [3] and processed using an Amazon Web Services architecture. A fine-tuned MobileNets artificial neural network [4] was used to classify images into four distinct categories, which had been determined in a preliminary analysis. The categories included classical memes, number posts (e.g. screenshots of mobile broker portfolios), text posts (e.g. screenshots from twitter) and chart posts (e.g. other financial screenshots, such as charts). The reason for the classification of images into the four categories is that the images are so inherently different, that different extraction methods had to be applied for each category. OCR – methods [5] were used to extract text from images. Custom methods were applied to extract sentiment and other information from the resulting text. The data [1] is available on a 20-minute basis and can be used in many areas, such as financial forecasting and analyzing sentiment dynamics in social media posts.
format	Online Article Text
id	pubmed-9747619
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-97476192022-12-15 Reddit financial image post sentiment dataset Fottner, Alexander Okhrin, Yarema Pfahler, Jonathan Wustl, Julian Data Brief Data Article The dataset presented in this paper consists of sentiment information extracted from image and text data of financial subreddit posts. Members of these subreddits post about their trading behavior, express their opinions, and discuss capital market trends. Their posts contain sentiment information on financial topics as well as signaling information on trading decisions. Frequently, members post screenshots of their portfolios from their mobile broker apps. We collected the posts, processed them to extract sentiment scores using various methods, and anonymized them. The dataset consists therefore not of any content from the posts or information about the author, but the processed sentiment information within the post. Further financial tickers mentioned in the posts are tracked, such that the effect of sentiment in the posts can be attributed to financial products and used in the context of financial forecasting. The posts were collected using the Reddit [2] and Pushshift APIs [3] and processed using an Amazon Web Services architecture. A fine-tuned MobileNets artificial neural network [4] was used to classify images into four distinct categories, which had been determined in a preliminary analysis. The categories included classical memes, number posts (e.g. screenshots of mobile broker portfolios), text posts (e.g. screenshots from twitter) and chart posts (e.g. other financial screenshots, such as charts). The reason for the classification of images into the four categories is that the images are so inherently different, that different extraction methods had to be applied for each category. OCR – methods [5] were used to extract text from images. Custom methods were applied to extract sentiment and other information from the resulting text. The data [1] is available on a 20-minute basis and can be used in many areas, such as financial forecasting and analyzing sentiment dynamics in social media posts. Elsevier 2022-11-17 /pmc/articles/PMC9747619/ /pubmed/36533290 http://dx.doi.org/10.1016/j.dib.2022.108759 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Data Article Fottner, Alexander Okhrin, Yarema Pfahler, Jonathan Wustl, Julian Reddit financial image post sentiment dataset
title	Reddit financial image post sentiment dataset
title_full	Reddit financial image post sentiment dataset
title_fullStr	Reddit financial image post sentiment dataset
title_full_unstemmed	Reddit financial image post sentiment dataset
title_short	Reddit financial image post sentiment dataset
title_sort	reddit financial image post sentiment dataset
topic	Data Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9747619/ https://www.ncbi.nlm.nih.gov/pubmed/36533290 http://dx.doi.org/10.1016/j.dib.2022.108759
work_keys_str_mv	AT fottneralexander redditfinancialimagepostsentimentdataset AT okhrinyarema redditfinancialimagepostsentimentdataset AT pfahlerjonathan redditfinancialimagepostsentimentdataset AT wustljulian redditfinancialimagepostsentimentdataset

Reddit financial image post sentiment dataset

Ejemplares similares