Cargando…

A Topic Modeling for ALICE'S Log Messages using Latent Dirichlet Allocation

In modern-day software where digital technology is everywhere, the system can generate a massive amount of log messages every second. Like other data, a log can provide insight and depth knowledge of the system given enough resources and time. However, not all systems have an organized log system, a...

Descripción completa

Detalles Bibliográficos
Autores principales: Prayurahong, Pattapon, Phunchongharn, Phond, Chibante Barroso, Vasco
Lenguaje:eng
Publicado: 2022
Materias:
Acceso en línea:https://dx.doi.org/10.1109/ICKII55100.2022.9983522
http://cds.cern.ch/record/2861086
_version_ 1780977796997185536
author Prayurahong, Pattapon
Phunchongharn, Phond
Chibante Barroso, Vasco
author_facet Prayurahong, Pattapon
Phunchongharn, Phond
Chibante Barroso, Vasco
author_sort Prayurahong, Pattapon
collection CERN
description In modern-day software where digital technology is everywhere, the system can generate a massive amount of log messages every second. Like other data, a log can provide insight and depth knowledge of the system given enough resources and time. However, not all systems have an organized log system, and an unorganized log is messy and difficult to navigate. There are many challenging points for organizing the log messages. As the amount of log data generated is massive, it is impossible to be handled by human labor alone. A log message is not regular human communication. To thoroughly understand the content inside the log, assistance from specialists of that particular system is required. These problems exist everywhere, and there is no exception even for high-performance computing systems like those used in the ALICE experiment at CERN. In this paper, we propose a topic modeling for ALICE’s log messages using the Latent Dirichlet Allocation algorithm. The objective is to convert the messy log messages into categorized ones. We handled the log messages and preprocessed them using Bag of Word. Then we performed hyperparameter-tuning to find the suitable number of topics using topic coherence as an evaluated measurement. Additionally, we also applied the same method to the log dataset of HDFS, to ensure the valid ability of the model. Finally, the outputs were then handed to CERN domain experts to give the final evaluation. From the result, we could create a practical topic modeling framework for ALICE’s log messages in a real scenario.
id cern-2861086
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2022
record_format invenio
spelling cern-28610862023-06-16T09:23:35Zdoi:10.1109/ICKII55100.2022.9983522http://cds.cern.ch/record/2861086engPrayurahong, PattaponPhunchongharn, PhondChibante Barroso, VascoA Topic Modeling for ALICE'S Log Messages using Latent Dirichlet AllocationComputing and ComputersIn modern-day software where digital technology is everywhere, the system can generate a massive amount of log messages every second. Like other data, a log can provide insight and depth knowledge of the system given enough resources and time. However, not all systems have an organized log system, and an unorganized log is messy and difficult to navigate. There are many challenging points for organizing the log messages. As the amount of log data generated is massive, it is impossible to be handled by human labor alone. A log message is not regular human communication. To thoroughly understand the content inside the log, assistance from specialists of that particular system is required. These problems exist everywhere, and there is no exception even for high-performance computing systems like those used in the ALICE experiment at CERN. In this paper, we propose a topic modeling for ALICE’s log messages using the Latent Dirichlet Allocation algorithm. The objective is to convert the messy log messages into categorized ones. We handled the log messages and preprocessed them using Bag of Word. Then we performed hyperparameter-tuning to find the suitable number of topics using topic coherence as an evaluated measurement. Additionally, we also applied the same method to the log dataset of HDFS, to ensure the valid ability of the model. Finally, the outputs were then handed to CERN domain experts to give the final evaluation. From the result, we could create a practical topic modeling framework for ALICE’s log messages in a real scenario.oai:cds.cern.ch:28610862022
spellingShingle Computing and Computers
Prayurahong, Pattapon
Phunchongharn, Phond
Chibante Barroso, Vasco
A Topic Modeling for ALICE'S Log Messages using Latent Dirichlet Allocation
title A Topic Modeling for ALICE'S Log Messages using Latent Dirichlet Allocation
title_full A Topic Modeling for ALICE'S Log Messages using Latent Dirichlet Allocation
title_fullStr A Topic Modeling for ALICE'S Log Messages using Latent Dirichlet Allocation
title_full_unstemmed A Topic Modeling for ALICE'S Log Messages using Latent Dirichlet Allocation
title_short A Topic Modeling for ALICE'S Log Messages using Latent Dirichlet Allocation
title_sort topic modeling for alice's log messages using latent dirichlet allocation
topic Computing and Computers
url https://dx.doi.org/10.1109/ICKII55100.2022.9983522
http://cds.cern.ch/record/2861086
work_keys_str_mv AT prayurahongpattapon atopicmodelingforaliceslogmessagesusinglatentdirichletallocation
AT phunchongharnphond atopicmodelingforaliceslogmessagesusinglatentdirichletallocation
AT chibantebarrosovasco atopicmodelingforaliceslogmessagesusinglatentdirichletallocation
AT prayurahongpattapon topicmodelingforaliceslogmessagesusinglatentdirichletallocation
AT phunchongharnphond topicmodelingforaliceslogmessagesusinglatentdirichletallocation
AT chibantebarrosovasco topicmodelingforaliceslogmessagesusinglatentdirichletallocation