Cargando…
ATLAS Analytics and Machine Learning Platforms
In 2015 ATLAS Distributed Computing started to migrate its monitoring systems away from Oracle DB and decided to adopt new big data platforms that are open source, horizontally scalable, and offer the flexibility of NoSQL systems. Three years later, the full software stack is in place, the system is...
Autores principales: | , , , |
---|---|
Lenguaje: | eng |
Publicado: |
2018
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/2627020 |
Sumario: | In 2015 ATLAS Distributed Computing started to migrate its monitoring systems away from Oracle DB and decided to adopt new big data platforms that are open source, horizontally scalable, and offer the flexibility of NoSQL systems. Three years later, the full software stack is in place, the system is considered in production and operating at near maximum capacity (in terms of storage capacity and tightly coupled analysis capability). The new model provides several tools for fast and easy to deploy monitoring and accounting. The main advantages are: ample ways to do complex analytics studies (using technologies such as java, pig, spark, python, jupyter), flexibility in reorganization of data flows, near real time and inline processing. The analytics studies improve our understanding of different computing systems and their interplay, thus enabling whole-system debugging and optimization. In addition, the platform provides services to alarm or warn on anomalous conditions, and several services closing feedback loops with the Distributed Computing systems. Here we briefly describe the main system components and data flows, but will concentrate on both hardware and software tools we use for in depth analytics/simulations, support for machine learning algorithms, specifically artificial neural network training and reinforcement learning techniques. We describe several applications the platform enables, and discuss ways for further scale up. |
---|