Cargando…

LogEvent2vec: LogEvent-to-Vector Based Anomaly Detection for Large-Scale Logs in Internet of Things

Log anomaly detection is an efficient method to manage modern large-scale Internet of Things (IoT) systems. More and more works start to apply natural language processing (NLP) methods, and in particular word2vec, in the log feature extraction. Word2vec can extract the relevance between words and ve...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Jin, Tang, Yangning, He, Shiming, Zhao, Changqing, Sharma, Pradip Kumar, Alfarraj, Osama, Tolba, Amr
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7249657/
https://www.ncbi.nlm.nih.gov/pubmed/32357404
http://dx.doi.org/10.3390/s20092451
_version_ 1783538631112130560
author Wang, Jin
Tang, Yangning
He, Shiming
Zhao, Changqing
Sharma, Pradip Kumar
Alfarraj, Osama
Tolba, Amr
author_facet Wang, Jin
Tang, Yangning
He, Shiming
Zhao, Changqing
Sharma, Pradip Kumar
Alfarraj, Osama
Tolba, Amr
author_sort Wang, Jin
collection PubMed
description Log anomaly detection is an efficient method to manage modern large-scale Internet of Things (IoT) systems. More and more works start to apply natural language processing (NLP) methods, and in particular word2vec, in the log feature extraction. Word2vec can extract the relevance between words and vectorize the words. However, the computing cost of training word2vec is high. Anomalies in logs are dependent on not only an individual log message but also on the log message sequence. Therefore, the vector of words from word2vec can not be used directly, which needs to be transformed into the vector of log events and further transformed into the vector of log sequences. To reduce computational cost and avoid multiple transformations, in this paper, we propose an offline feature extraction model, named LogEvent2vec, which takes the log event as input of word2vec to extract the relevance between log events and vectorize log events directly. LogEvent2vec can work with any coordinate transformation methods and anomaly detection models. After getting the log event vector, we transform log event vector to log sequence vector by bary or tf-idf and three kinds of supervised models (Random Forests, Naive Bayes, and Neural Networks) are trained to detect the anomalies. We have conducted extensive experiments on a real public log dataset from BlueGene/L (BGL). The experimental results demonstrate that LogEvent2vec can significantly reduce computational time by 30 times and improve accuracy, comparing with word2vec. LogEvent2vec with bary and Random Forest can achieve the best F1-score and LogEvent2vec with tf-idf and Naive Bayes needs the least computational time.
format Online
Article
Text
id pubmed-7249657
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-72496572020-06-10 LogEvent2vec: LogEvent-to-Vector Based Anomaly Detection for Large-Scale Logs in Internet of Things Wang, Jin Tang, Yangning He, Shiming Zhao, Changqing Sharma, Pradip Kumar Alfarraj, Osama Tolba, Amr Sensors (Basel) Article Log anomaly detection is an efficient method to manage modern large-scale Internet of Things (IoT) systems. More and more works start to apply natural language processing (NLP) methods, and in particular word2vec, in the log feature extraction. Word2vec can extract the relevance between words and vectorize the words. However, the computing cost of training word2vec is high. Anomalies in logs are dependent on not only an individual log message but also on the log message sequence. Therefore, the vector of words from word2vec can not be used directly, which needs to be transformed into the vector of log events and further transformed into the vector of log sequences. To reduce computational cost and avoid multiple transformations, in this paper, we propose an offline feature extraction model, named LogEvent2vec, which takes the log event as input of word2vec to extract the relevance between log events and vectorize log events directly. LogEvent2vec can work with any coordinate transformation methods and anomaly detection models. After getting the log event vector, we transform log event vector to log sequence vector by bary or tf-idf and three kinds of supervised models (Random Forests, Naive Bayes, and Neural Networks) are trained to detect the anomalies. We have conducted extensive experiments on a real public log dataset from BlueGene/L (BGL). The experimental results demonstrate that LogEvent2vec can significantly reduce computational time by 30 times and improve accuracy, comparing with word2vec. LogEvent2vec with bary and Random Forest can achieve the best F1-score and LogEvent2vec with tf-idf and Naive Bayes needs the least computational time. MDPI 2020-04-26 /pmc/articles/PMC7249657/ /pubmed/32357404 http://dx.doi.org/10.3390/s20092451 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Wang, Jin
Tang, Yangning
He, Shiming
Zhao, Changqing
Sharma, Pradip Kumar
Alfarraj, Osama
Tolba, Amr
LogEvent2vec: LogEvent-to-Vector Based Anomaly Detection for Large-Scale Logs in Internet of Things
title LogEvent2vec: LogEvent-to-Vector Based Anomaly Detection for Large-Scale Logs in Internet of Things
title_full LogEvent2vec: LogEvent-to-Vector Based Anomaly Detection for Large-Scale Logs in Internet of Things
title_fullStr LogEvent2vec: LogEvent-to-Vector Based Anomaly Detection for Large-Scale Logs in Internet of Things
title_full_unstemmed LogEvent2vec: LogEvent-to-Vector Based Anomaly Detection for Large-Scale Logs in Internet of Things
title_short LogEvent2vec: LogEvent-to-Vector Based Anomaly Detection for Large-Scale Logs in Internet of Things
title_sort logevent2vec: logevent-to-vector based anomaly detection for large-scale logs in internet of things
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7249657/
https://www.ncbi.nlm.nih.gov/pubmed/32357404
http://dx.doi.org/10.3390/s20092451
work_keys_str_mv AT wangjin logevent2veclogeventtovectorbasedanomalydetectionforlargescalelogsininternetofthings
AT tangyangning logevent2veclogeventtovectorbasedanomalydetectionforlargescalelogsininternetofthings
AT heshiming logevent2veclogeventtovectorbasedanomalydetectionforlargescalelogsininternetofthings
AT zhaochangqing logevent2veclogeventtovectorbasedanomalydetectionforlargescalelogsininternetofthings
AT sharmapradipkumar logevent2veclogeventtovectorbasedanomalydetectionforlargescalelogsininternetofthings
AT alfarrajosama logevent2veclogeventtovectorbasedanomalydetectionforlargescalelogsininternetofthings
AT tolbaamr logevent2veclogeventtovectorbasedanomalydetectionforlargescalelogsininternetofthings