Cargando…

Optimised Lambda Architecture for Monitoring Scientific Infrastructure

Within scientific infrastructuscientists execute millions of computational jobs daily, resulting in the movement of petabytes of data over the heterogeneous infrastructure. Monitoring the computing and user activities over such a complex infrastructure is incredibly demanding. Whereas present soluti...

Descripción completa

Detalles Bibliográficos
Autores principales: Suthakar, Uthayanath, Magnoni, Luca, Smith, David Ryan, Khan, Akram
Lenguaje:eng
Publicado: 2021
Materias:
Acceso en línea:https://dx.doi.org/10.1109/tpds.2017.2772241
http://cds.cern.ch/record/2751541
_version_ 1780969240365367296
author Suthakar, Uthayanath
Magnoni, Luca
Smith, David Ryan
Khan, Akram
author_facet Suthakar, Uthayanath
Magnoni, Luca
Smith, David Ryan
Khan, Akram
author_sort Suthakar, Uthayanath
collection CERN
description Within scientific infrastructuscientists execute millions of computational jobs daily, resulting in the movement of petabytes of data over the heterogeneous infrastructure. Monitoring the computing and user activities over such a complex infrastructure is incredibly demanding. Whereas present solutions are traditionally based on a Relational Database Management System (RDBMS) for data storage and processing, recent developments evaluate the Lambda Architecture (LA). In particular these studies have evaluated data storage and batch processing for processing large-scale monitoring datasets using Hadoop and its MapReduce framework. Although LA performed better than the RDBMS following evaluation, it was fairly complex to implement and maintain. This paper presents an Optimised Lambda Architecture (OLA) using the Apache Spark ecosystem, which involves modelling an efficient way of joining batch computation and real-time computation transparently without the need to add complexity. A few models were explored: pure streaming, pure batch computation, and the combination of both batch and streaming. An evaluation of the OLA on the CERN IT on-premises Hadoop cluster and the public Amazon cloud infrastructure for the monitoring WLCG Data acTivities (WDT) use case are both presented, demonstrating how the new architecture can offer benefits by combining both batch and real-time processing to compensate for batch-processing latency.
id oai-inspirehep.net-1845288
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2021
record_format invenio
spelling oai-inspirehep.net-18452882021-02-12T17:01:27Zdoi:10.1109/tpds.2017.2772241http://cds.cern.ch/record/2751541engSuthakar, UthayanathMagnoni, LucaSmith, David RyanKhan, AkramOptimised Lambda Architecture for Monitoring Scientific InfrastructureComputing and ComputersWithin scientific infrastructuscientists execute millions of computational jobs daily, resulting in the movement of petabytes of data over the heterogeneous infrastructure. Monitoring the computing and user activities over such a complex infrastructure is incredibly demanding. Whereas present solutions are traditionally based on a Relational Database Management System (RDBMS) for data storage and processing, recent developments evaluate the Lambda Architecture (LA). In particular these studies have evaluated data storage and batch processing for processing large-scale monitoring datasets using Hadoop and its MapReduce framework. Although LA performed better than the RDBMS following evaluation, it was fairly complex to implement and maintain. This paper presents an Optimised Lambda Architecture (OLA) using the Apache Spark ecosystem, which involves modelling an efficient way of joining batch computation and real-time computation transparently without the need to add complexity. A few models were explored: pure streaming, pure batch computation, and the combination of both batch and streaming. An evaluation of the OLA on the CERN IT on-premises Hadoop cluster and the public Amazon cloud infrastructure for the monitoring WLCG Data acTivities (WDT) use case are both presented, demonstrating how the new architecture can offer benefits by combining both batch and real-time processing to compensate for batch-processing latency.oai:inspirehep.net:18452882021
spellingShingle Computing and Computers
Suthakar, Uthayanath
Magnoni, Luca
Smith, David Ryan
Khan, Akram
Optimised Lambda Architecture for Monitoring Scientific Infrastructure
title Optimised Lambda Architecture for Monitoring Scientific Infrastructure
title_full Optimised Lambda Architecture for Monitoring Scientific Infrastructure
title_fullStr Optimised Lambda Architecture for Monitoring Scientific Infrastructure
title_full_unstemmed Optimised Lambda Architecture for Monitoring Scientific Infrastructure
title_short Optimised Lambda Architecture for Monitoring Scientific Infrastructure
title_sort optimised lambda architecture for monitoring scientific infrastructure
topic Computing and Computers
url https://dx.doi.org/10.1109/tpds.2017.2772241
http://cds.cern.ch/record/2751541
work_keys_str_mv AT suthakaruthayanath optimisedlambdaarchitectureformonitoringscientificinfrastructure
AT magnoniluca optimisedlambdaarchitectureformonitoringscientificinfrastructure
AT smithdavidryan optimisedlambdaarchitectureformonitoringscientificinfrastructure
AT khanakram optimisedlambdaarchitectureformonitoringscientificinfrastructure