Cargando…

Event detection in finance using hierarchical clustering algorithms on news and tweets

In the current age of overwhelming information and massive production of textual data on the Web, Event Detection has become an increasingly important task in various application domains. Several research branches have been developed to tackle the problem from different perspectives, including Natur...

Descripción completa

Detalles Bibliográficos
Autores principales: Carta, Salvatore, Consoli, Sergio, Piras, Luca, Podda, Alessandro Sebastian, Reforgiato Recupero, Diego
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8157256/
https://www.ncbi.nlm.nih.gov/pubmed/34084918
http://dx.doi.org/10.7717/peerj-cs.438
_version_ 1783699641967050752
author Carta, Salvatore
Consoli, Sergio
Piras, Luca
Podda, Alessandro Sebastian
Reforgiato Recupero, Diego
author_facet Carta, Salvatore
Consoli, Sergio
Piras, Luca
Podda, Alessandro Sebastian
Reforgiato Recupero, Diego
author_sort Carta, Salvatore
collection PubMed
description In the current age of overwhelming information and massive production of textual data on the Web, Event Detection has become an increasingly important task in various application domains. Several research branches have been developed to tackle the problem from different perspectives, including Natural Language Processing and Big Data analysis, with the goal of providing valuable resources to support decision-making in a wide variety of fields. In this paper, we propose a real-time domain-specific clustering-based event-detection approach that integrates textual information coming, on one hand, from traditional newswires and, on the other hand, from microblogging platforms. The goal of the implemented pipeline is twofold: (i) providing insights to the user about the relevant events that are reported in the press on a daily basis; (ii) alerting the user about potentially important and impactful events, referred to as hot events, for some specific tasks or domains of interest. The algorithm identifies clusters of related news stories published by globally renowned press sources, which guarantee authoritative, noise-free information about current affairs; subsequently, the content extracted from microblogs is associated to the clusters in order to gain an assessment of the relevance of the event in the public opinion. To identify the events of a day d we create the lexicon by looking at news articles and stock data of previous days up to d(−1) Although the approach can be extended to a variety of domains (e.g. politics, economy, sports), we hereby present a specific implementation in the financial sector. We validated our solution through a qualitative and quantitative evaluation, performed on the Dow Jones’ Data, News and Analytics dataset, on a stream of messages extracted from the microblogging platform Stocktwits, and on the Standard & Poor’s 500 index time-series. The experiments demonstrate the effectiveness of our proposal in extracting meaningful information from real-world events and in spotting hot events in the financial sphere. An added value of the evaluation is given by the visual inspection of a selected number of significant real-world events, starting from the Brexit Referendum and reaching until the recent outbreak of the Covid-19 pandemic in early 2020.
format Online
Article
Text
id pubmed-8157256
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-81572562021-06-02 Event detection in finance using hierarchical clustering algorithms on news and tweets Carta, Salvatore Consoli, Sergio Piras, Luca Podda, Alessandro Sebastian Reforgiato Recupero, Diego PeerJ Comput Sci Computational Linguistics In the current age of overwhelming information and massive production of textual data on the Web, Event Detection has become an increasingly important task in various application domains. Several research branches have been developed to tackle the problem from different perspectives, including Natural Language Processing and Big Data analysis, with the goal of providing valuable resources to support decision-making in a wide variety of fields. In this paper, we propose a real-time domain-specific clustering-based event-detection approach that integrates textual information coming, on one hand, from traditional newswires and, on the other hand, from microblogging platforms. The goal of the implemented pipeline is twofold: (i) providing insights to the user about the relevant events that are reported in the press on a daily basis; (ii) alerting the user about potentially important and impactful events, referred to as hot events, for some specific tasks or domains of interest. The algorithm identifies clusters of related news stories published by globally renowned press sources, which guarantee authoritative, noise-free information about current affairs; subsequently, the content extracted from microblogs is associated to the clusters in order to gain an assessment of the relevance of the event in the public opinion. To identify the events of a day d we create the lexicon by looking at news articles and stock data of previous days up to d(−1) Although the approach can be extended to a variety of domains (e.g. politics, economy, sports), we hereby present a specific implementation in the financial sector. We validated our solution through a qualitative and quantitative evaluation, performed on the Dow Jones’ Data, News and Analytics dataset, on a stream of messages extracted from the microblogging platform Stocktwits, and on the Standard & Poor’s 500 index time-series. The experiments demonstrate the effectiveness of our proposal in extracting meaningful information from real-world events and in spotting hot events in the financial sphere. An added value of the evaluation is given by the visual inspection of a selected number of significant real-world events, starting from the Brexit Referendum and reaching until the recent outbreak of the Covid-19 pandemic in early 2020. PeerJ Inc. 2021-05-10 /pmc/articles/PMC8157256/ /pubmed/34084918 http://dx.doi.org/10.7717/peerj-cs.438 Text en © 2021 Carta et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Computational Linguistics
Carta, Salvatore
Consoli, Sergio
Piras, Luca
Podda, Alessandro Sebastian
Reforgiato Recupero, Diego
Event detection in finance using hierarchical clustering algorithms on news and tweets
title Event detection in finance using hierarchical clustering algorithms on news and tweets
title_full Event detection in finance using hierarchical clustering algorithms on news and tweets
title_fullStr Event detection in finance using hierarchical clustering algorithms on news and tweets
title_full_unstemmed Event detection in finance using hierarchical clustering algorithms on news and tweets
title_short Event detection in finance using hierarchical clustering algorithms on news and tweets
title_sort event detection in finance using hierarchical clustering algorithms on news and tweets
topic Computational Linguistics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8157256/
https://www.ncbi.nlm.nih.gov/pubmed/34084918
http://dx.doi.org/10.7717/peerj-cs.438
work_keys_str_mv AT cartasalvatore eventdetectioninfinanceusinghierarchicalclusteringalgorithmsonnewsandtweets
AT consolisergio eventdetectioninfinanceusinghierarchicalclusteringalgorithmsonnewsandtweets
AT pirasluca eventdetectioninfinanceusinghierarchicalclusteringalgorithmsonnewsandtweets
AT poddaalessandrosebastian eventdetectioninfinanceusinghierarchicalclusteringalgorithmsonnewsandtweets
AT reforgiatorecuperodiego eventdetectioninfinanceusinghierarchicalclusteringalgorithmsonnewsandtweets