Cargando…
A topic modeling framework for spatio-temporal information management
Real-time processing and learning of conflicting data, especially messages coming from different ideas, locations, and time, in a dynamic environment such as Twitter is a challenging task that recently gained lots of attention. This paper introduces a framework for managing, processing, analyzing, d...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier Ltd.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7338024/ https://www.ncbi.nlm.nih.gov/pubmed/32836694 http://dx.doi.org/10.1016/j.ipm.2020.102340 |
_version_ | 1783554594493693952 |
---|---|
author | Asghari, Mohsen Sierra-Sosa, Daniel Elmaghraby, Adel S. |
author_facet | Asghari, Mohsen Sierra-Sosa, Daniel Elmaghraby, Adel S. |
author_sort | Asghari, Mohsen |
collection | PubMed |
description | Real-time processing and learning of conflicting data, especially messages coming from different ideas, locations, and time, in a dynamic environment such as Twitter is a challenging task that recently gained lots of attention. This paper introduces a framework for managing, processing, analyzing, detecting, and tracking topics in streaming data. We propose a model selector procedure with a hybrid indicator to tackle the challenge of online topic detection. In this framework, we built an automatic data processing pipeline with two levels of cleaning. Regular and deep cleaning are applied using multiple sources of meta knowledge to enhance data quality. Deep learning and transfer learning techniques are used to classify health-related tweets, with high accuracy and improved F1-Score. In this system, we used visualization to have a better understanding of trending topics. To demonstrate the validity of this framework, we implemented and applied it to health-related twitter data from users originating in the USA over nine months. The results of this implementation show that this framework was able to detect and track the topics at a level comparable to manual annotation. To better explain the emerging and changing topics in various locations over time the result is graphically displayed on top of the United States map. |
format | Online Article Text |
id | pubmed-7338024 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Elsevier Ltd. |
record_format | MEDLINE/PubMed |
spelling | pubmed-73380242020-07-07 A topic modeling framework for spatio-temporal information management Asghari, Mohsen Sierra-Sosa, Daniel Elmaghraby, Adel S. Inf Process Manag Article Real-time processing and learning of conflicting data, especially messages coming from different ideas, locations, and time, in a dynamic environment such as Twitter is a challenging task that recently gained lots of attention. This paper introduces a framework for managing, processing, analyzing, detecting, and tracking topics in streaming data. We propose a model selector procedure with a hybrid indicator to tackle the challenge of online topic detection. In this framework, we built an automatic data processing pipeline with two levels of cleaning. Regular and deep cleaning are applied using multiple sources of meta knowledge to enhance data quality. Deep learning and transfer learning techniques are used to classify health-related tweets, with high accuracy and improved F1-Score. In this system, we used visualization to have a better understanding of trending topics. To demonstrate the validity of this framework, we implemented and applied it to health-related twitter data from users originating in the USA over nine months. The results of this implementation show that this framework was able to detect and track the topics at a level comparable to manual annotation. To better explain the emerging and changing topics in various locations over time the result is graphically displayed on top of the United States map. Elsevier Ltd. 2020-11 2020-07-06 /pmc/articles/PMC7338024/ /pubmed/32836694 http://dx.doi.org/10.1016/j.ipm.2020.102340 Text en © 2020 Elsevier Ltd. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active. |
spellingShingle | Article Asghari, Mohsen Sierra-Sosa, Daniel Elmaghraby, Adel S. A topic modeling framework for spatio-temporal information management |
title | A topic modeling framework for spatio-temporal information management |
title_full | A topic modeling framework for spatio-temporal information management |
title_fullStr | A topic modeling framework for spatio-temporal information management |
title_full_unstemmed | A topic modeling framework for spatio-temporal information management |
title_short | A topic modeling framework for spatio-temporal information management |
title_sort | topic modeling framework for spatio-temporal information management |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7338024/ https://www.ncbi.nlm.nih.gov/pubmed/32836694 http://dx.doi.org/10.1016/j.ipm.2020.102340 |
work_keys_str_mv | AT asgharimohsen atopicmodelingframeworkforspatiotemporalinformationmanagement AT sierrasosadaniel atopicmodelingframeworkforspatiotemporalinformationmanagement AT elmaghrabyadels atopicmodelingframeworkforspatiotemporalinformationmanagement AT asgharimohsen topicmodelingframeworkforspatiotemporalinformationmanagement AT sierrasosadaniel topicmodelingframeworkforspatiotemporalinformationmanagement AT elmaghrabyadels topicmodelingframeworkforspatiotemporalinformationmanagement |