Cargando…
A Topic Modeling for ALICE'S Log Messages using Latent Dirichlet Allocation
In modern-day software where digital technology is everywhere, the system can generate a massive amount of log messages every second. Like other data, a log can provide insight and depth knowledge of the system given enough resources and time. However, not all systems have an organized log system, a...
Autores principales: | , , |
---|---|
Lenguaje: | eng |
Publicado: |
2022
|
Materias: | |
Acceso en línea: | https://dx.doi.org/10.1109/ICKII55100.2022.9983522 http://cds.cern.ch/record/2861086 |
_version_ | 1780977796997185536 |
---|---|
author | Prayurahong, Pattapon Phunchongharn, Phond Chibante Barroso, Vasco |
author_facet | Prayurahong, Pattapon Phunchongharn, Phond Chibante Barroso, Vasco |
author_sort | Prayurahong, Pattapon |
collection | CERN |
description | In modern-day software where digital technology is everywhere, the system can generate a massive amount of log messages every second. Like other data, a log can provide insight and depth knowledge of the system given enough resources and time. However, not all systems have an organized log system, and an unorganized log is messy and difficult to navigate. There are many challenging points for organizing the log messages. As the amount of log data generated is massive, it is impossible to be handled by human labor alone. A log message is not regular human communication. To thoroughly understand the content inside the log, assistance from specialists of that particular system is required. These problems exist everywhere, and there is no exception even for high-performance computing systems like those used in the ALICE experiment at CERN. In this paper, we propose a topic modeling for ALICE’s log messages using the Latent Dirichlet Allocation algorithm. The objective is to convert the messy log messages into categorized ones. We handled the log messages and preprocessed them using Bag of Word. Then we performed hyperparameter-tuning to find the suitable number of topics using topic coherence as an evaluated measurement. Additionally, we also applied the same method to the log dataset of HDFS, to ensure the valid ability of the model. Finally, the outputs were then handed to CERN domain experts to give the final evaluation. From the result, we could create a practical topic modeling framework for ALICE’s log messages in a real scenario. |
id | cern-2861086 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2022 |
record_format | invenio |
spelling | cern-28610862023-06-16T09:23:35Zdoi:10.1109/ICKII55100.2022.9983522http://cds.cern.ch/record/2861086engPrayurahong, PattaponPhunchongharn, PhondChibante Barroso, VascoA Topic Modeling for ALICE'S Log Messages using Latent Dirichlet AllocationComputing and ComputersIn modern-day software where digital technology is everywhere, the system can generate a massive amount of log messages every second. Like other data, a log can provide insight and depth knowledge of the system given enough resources and time. However, not all systems have an organized log system, and an unorganized log is messy and difficult to navigate. There are many challenging points for organizing the log messages. As the amount of log data generated is massive, it is impossible to be handled by human labor alone. A log message is not regular human communication. To thoroughly understand the content inside the log, assistance from specialists of that particular system is required. These problems exist everywhere, and there is no exception even for high-performance computing systems like those used in the ALICE experiment at CERN. In this paper, we propose a topic modeling for ALICE’s log messages using the Latent Dirichlet Allocation algorithm. The objective is to convert the messy log messages into categorized ones. We handled the log messages and preprocessed them using Bag of Word. Then we performed hyperparameter-tuning to find the suitable number of topics using topic coherence as an evaluated measurement. Additionally, we also applied the same method to the log dataset of HDFS, to ensure the valid ability of the model. Finally, the outputs were then handed to CERN domain experts to give the final evaluation. From the result, we could create a practical topic modeling framework for ALICE’s log messages in a real scenario.oai:cds.cern.ch:28610862022 |
spellingShingle | Computing and Computers Prayurahong, Pattapon Phunchongharn, Phond Chibante Barroso, Vasco A Topic Modeling for ALICE'S Log Messages using Latent Dirichlet Allocation |
title | A Topic Modeling for ALICE'S Log Messages using Latent Dirichlet Allocation |
title_full | A Topic Modeling for ALICE'S Log Messages using Latent Dirichlet Allocation |
title_fullStr | A Topic Modeling for ALICE'S Log Messages using Latent Dirichlet Allocation |
title_full_unstemmed | A Topic Modeling for ALICE'S Log Messages using Latent Dirichlet Allocation |
title_short | A Topic Modeling for ALICE'S Log Messages using Latent Dirichlet Allocation |
title_sort | topic modeling for alice's log messages using latent dirichlet allocation |
topic | Computing and Computers |
url | https://dx.doi.org/10.1109/ICKII55100.2022.9983522 http://cds.cern.ch/record/2861086 |
work_keys_str_mv | AT prayurahongpattapon atopicmodelingforaliceslogmessagesusinglatentdirichletallocation AT phunchongharnphond atopicmodelingforaliceslogmessagesusinglatentdirichletallocation AT chibantebarrosovasco atopicmodelingforaliceslogmessagesusinglatentdirichletallocation AT prayurahongpattapon topicmodelingforaliceslogmessagesusinglatentdirichletallocation AT phunchongharnphond topicmodelingforaliceslogmessagesusinglatentdirichletallocation AT chibantebarrosovasco topicmodelingforaliceslogmessagesusinglatentdirichletallocation |