Cargando…

Spatio-temporal Crime Analysis and Forecasting on Twitter Data Using Machine Learning Algorithms

The concept of social media began to gain popularity in the late 1990s and has played a significant role in connecting people across the globe. The constant addition of features to old social media platforms and the creation of new ones have helped amass and retain an extensive user base. Users coul...

Descripción completa

Detalles Bibliográficos
Autores principales: Vivek, Meghashyam, Prathap, Boppuru Rudra
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Nature Singapore 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10163854/
https://www.ncbi.nlm.nih.gov/pubmed/37193217
http://dx.doi.org/10.1007/s42979-023-01816-y
Descripción
Sumario:The concept of social media began to gain popularity in the late 1990s and has played a significant role in connecting people across the globe. The constant addition of features to old social media platforms and the creation of new ones have helped amass and retain an extensive user base. Users could now share their views and provide detailed accounts of events from worldwide to reach like-minded people. This led to the popularization of blogging and brought into focus the posts of the commoner. These posts began to be verified and included in mainstream news articles bringing about a revolution in journalism. This research aims to use a social media platform, Twitter, to classify, visualize, and forecast Indian crime tweet data and provide a spatio-temporal view of crime in the country using statistical and machine learning models. The Tweepy Python module's search function and '#crime' query have been used to scrape relevant tweets under geographical constraints, followed by substring-keyword classification using 318 unique crime keywords. The Bokeh and gmaps Python modules create analytical and geospatial visualizations, respectively. Time series forecasting of crime tweet count is performed by comparing the accuracy of Long Short-Term Memory (LSTM), Auto-Regressive Integrated Moving Average (ARIMA), and Seasonal Auto-Regressivee Integrated Moving Average (SARIMA) models to determine the best model.