Cargando…
Deep Learning and Isolation Based Security for Intrusion Detection and Prevention in Grid Computing
The use of distributed computational resources for the solution of scientific problems, which require highly intensive data processing is a fundamental mechanism for modern scientific collaborations. The Worldwide Large Hadron Collider Computing Grid (WLCG) is one of the most important examples of a...
Autor principal: | |
---|---|
Lenguaje: | eng |
Publicado: |
2019
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/2686424 |
Sumario: | The use of distributed computational resources for the solution of scientific problems, which require highly intensive data processing is a fundamental mechanism for modern scientific collaborations. The Worldwide Large Hadron Collider Computing Grid (WLCG) is one of the most important examples of a distributed infrastructure for scientific projects and is one of the pioneering examples of grid computing. The WLCG is the global grid that analyzes data from the Large Hadron Collider (LHC) at the European Organization for Nuclear Research (CERN), with 170 sites in 40 countries and more than 600,000 processing cores. The grid service providers grant users access to resources that they can utilize on demand for the execution of custom software applications used for the analysis of data. The code that the users can execute is completely flexible, and commonly there are no significant restrictions. This flexibility and the availability of immense computing power increases the security challenges of these environments. Attackers are a concern for grid administrators. These attackers may request the execution of software with a malicious code that gives them the possibility of compromising the underlying institutions’ infrastructure. Grid systems need security countermeasures to keep the user code running, without allowing access to critical components but whilst still retaining flexibility. The administrators of grid systems also need to be continuously monitoring the activities that the applications are carrying out. An analysis of these activities is necessary to detect possible security issues, to identify ongoing incidents and to perform autonomous responses. The size and complexity of grid systems make manual security monitoring and response expensive and complicated for human analysts. Legacy intrusion detection and prevention systems (IDPS) such as Snort and OSSEC are traditionally used for security incident monitoring in the grid, cloud, clusters and standalone systems. However, IDPS are limited due to the use of hardcoded fixed rules that need to be updated continuously to cope with different threats. This thesis introduces an architecture for improving security in grid computing. The architecture integrates the use of security by isolation, behavior monitoring and deep learning (DL) for the classification of real-time traces of the running user payloads also known as grid jobs. The first component of the proposal, the Linux containers (LCs), are used to provide isolation between grid jobs and to gather specific traceable information about the behavior of individual jobs. LCs offer a safe environment for the execution of arbitrary user scripts or binaries, protecting the sensitive components of the grid member organizations. The containers consist of a software sandboxing technique and form a lightweight alternative to other technologies such as virtual machines (VMs) that usually implement a full machine-level emulation and can, therefore, significantly affect the performance. This performance loss is commonly unacceptable in high-throughput computing scenarios. Containers enable the collection of monitoring information from the processes running inside them. The data collected via the LCs monitoring is employed to feed a DL-based IDPS. DL methods can acquire knowledge from experience, which eliminates the need for operators to formally specify all the knowledge that a system requires. These methods can improve IDPS by building models that are utilized to detect security incidents automatically, having the ability to generalize to new classes of issues. DL can produce lower false positive rates for intrusion detection, but also provides a measure of false negatives, which can be improved with new training data. Convolutional neural networks (CNNs) are utilized for the distinction between regular and malicious job classes. A set of samples is collected from regular production grid jobs from the grid infrastructure of “A Large Ion Collider Experiment” (ALICE) and malicious Linux binaries from a malware research website. The features extracted from these samples are utilized for the training and validation of the machine learning (ML) models. The utilization of a generative approach to enhance the required training data is also proposed. Recurrent neural networks (RNN) are used as generative models for the simulation of training data that complements and improves the real collected dataset. This data augmentation strategy is useful to supplement the lack of training data in ML processes. The design characteristics, implementation details and testing environment of a proof-of-concept realization of the researched architecture called Arhuaco are described. Arhuaco combines the isolation and behavior monitoring ideas with deep learning, using a hybrid supervised classification approach with natural language processing for the feature selection and the preprocessing of text-like input data from the traces of the job’s system calls and network activity. The proof-of-concept was evaluated in the context of the grid of the ALICE collaboration, a member of the WLCG. Via empirical evaluations, it is described how recently proposed DL methods could outperform traditional ML methods in the task of intrusion detection in grid computing. CNNs applied to the classification of the grid job behavior are compared to support vector machines (SVMs). SVMs are one of the most popular algorithms in IDPS research. A long short-term memory (LSTM) has been tested to validate the idea that RNNs are helpful to improve and increase the training dataset coverage for intrusion detection in grid computing. An average runtime increase of 6.11% was observed when testing a set of regular ALICE grid jobs that was run with Arhuaco using LC isolation, behavior monitoring and classification with DL. An accuracy of 99.52% was obtained when validating CNNs in the classification of previously unseen system call traces as usual or malicious. For the validation of network traces, an accuracy value of 98.75% was achieved. The SVM was trained with simulated data to evaluate the LSTM method. Once the model was built, the SVM was applied to the classification of novel unseen network data traces from the original dataset. There was a 0.72% improvement in the accuracy. The results demonstrate that LCs utilized for isolation produce a moderate performance impact that can be reduced with several configuration options. CNNs applied to the classification of behavior trace data could distinguish between normal and malicious jobs with close to 100% accuracy. The generative method via LSTM improved and increased a training dataset for intrusion detection in grid computing. The proposed approaches solve the problems of analyzing the activity of grid jobs, identifying malicious activity and keeping traceability of the user-generated events. Therefore, better and stronger evidence to detect attacks and to find their source can be collected. |
---|