Cargando…

A topic modeling framework for spatio-temporal information management

Real-time processing and learning of conflicting data, especially messages coming from different ideas, locations, and time, in a dynamic environment such as Twitter is a challenging task that recently gained lots of attention. This paper introduces a framework for managing, processing, analyzing, d...

Descripción completa

Detalles Bibliográficos
Autores principales: Asghari, Mohsen, Sierra-Sosa, Daniel, Elmaghraby, Adel S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier Ltd. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7338024/
https://www.ncbi.nlm.nih.gov/pubmed/32836694
http://dx.doi.org/10.1016/j.ipm.2020.102340
_version_ 1783554594493693952
author Asghari, Mohsen
Sierra-Sosa, Daniel
Elmaghraby, Adel S.
author_facet Asghari, Mohsen
Sierra-Sosa, Daniel
Elmaghraby, Adel S.
author_sort Asghari, Mohsen
collection PubMed
description Real-time processing and learning of conflicting data, especially messages coming from different ideas, locations, and time, in a dynamic environment such as Twitter is a challenging task that recently gained lots of attention. This paper introduces a framework for managing, processing, analyzing, detecting, and tracking topics in streaming data. We propose a model selector procedure with a hybrid indicator to tackle the challenge of online topic detection. In this framework, we built an automatic data processing pipeline with two levels of cleaning. Regular and deep cleaning are applied using multiple sources of meta knowledge to enhance data quality. Deep learning and transfer learning techniques are used to classify health-related tweets, with high accuracy and improved F1-Score. In this system, we used visualization to have a better understanding of trending topics. To demonstrate the validity of this framework, we implemented and applied it to health-related twitter data from users originating in the USA over nine months. The results of this implementation show that this framework was able to detect and track the topics at a level comparable to manual annotation. To better explain the emerging and changing topics in various locations over time the result is graphically displayed on top of the United States map.
format Online
Article
Text
id pubmed-7338024
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Elsevier Ltd.
record_format MEDLINE/PubMed
spelling pubmed-73380242020-07-07 A topic modeling framework for spatio-temporal information management Asghari, Mohsen Sierra-Sosa, Daniel Elmaghraby, Adel S. Inf Process Manag Article Real-time processing and learning of conflicting data, especially messages coming from different ideas, locations, and time, in a dynamic environment such as Twitter is a challenging task that recently gained lots of attention. This paper introduces a framework for managing, processing, analyzing, detecting, and tracking topics in streaming data. We propose a model selector procedure with a hybrid indicator to tackle the challenge of online topic detection. In this framework, we built an automatic data processing pipeline with two levels of cleaning. Regular and deep cleaning are applied using multiple sources of meta knowledge to enhance data quality. Deep learning and transfer learning techniques are used to classify health-related tweets, with high accuracy and improved F1-Score. In this system, we used visualization to have a better understanding of trending topics. To demonstrate the validity of this framework, we implemented and applied it to health-related twitter data from users originating in the USA over nine months. The results of this implementation show that this framework was able to detect and track the topics at a level comparable to manual annotation. To better explain the emerging and changing topics in various locations over time the result is graphically displayed on top of the United States map. Elsevier Ltd. 2020-11 2020-07-06 /pmc/articles/PMC7338024/ /pubmed/32836694 http://dx.doi.org/10.1016/j.ipm.2020.102340 Text en © 2020 Elsevier Ltd. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Article
Asghari, Mohsen
Sierra-Sosa, Daniel
Elmaghraby, Adel S.
A topic modeling framework for spatio-temporal information management
title A topic modeling framework for spatio-temporal information management
title_full A topic modeling framework for spatio-temporal information management
title_fullStr A topic modeling framework for spatio-temporal information management
title_full_unstemmed A topic modeling framework for spatio-temporal information management
title_short A topic modeling framework for spatio-temporal information management
title_sort topic modeling framework for spatio-temporal information management
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7338024/
https://www.ncbi.nlm.nih.gov/pubmed/32836694
http://dx.doi.org/10.1016/j.ipm.2020.102340
work_keys_str_mv AT asgharimohsen atopicmodelingframeworkforspatiotemporalinformationmanagement
AT sierrasosadaniel atopicmodelingframeworkforspatiotemporalinformationmanagement
AT elmaghrabyadels atopicmodelingframeworkforspatiotemporalinformationmanagement
AT asgharimohsen topicmodelingframeworkforspatiotemporalinformationmanagement
AT sierrasosadaniel topicmodelingframeworkforspatiotemporalinformationmanagement
AT elmaghrabyadels topicmodelingframeworkforspatiotemporalinformationmanagement