Cargando…

PanDA Beyond ATLAS : A Scalable Workload Management System For Data Intensive Science

The LHC experiments are today at the leading edge of large scale distributed data-intensive computational science. The LHC's ATLAS experiment processes data volumes which are particularly extreme, over 140 PB to date, distributed worldwide at over of 120 sites. An important element in the succe...

Descripción completa

Detalles Bibliográficos
Autores principales: Borodin, M, De, K, Jha, S, Golubkov, D, Klimentov, A, Maeno, T, Nilsson, P, Oleynik, D, Panitkin, S, Petrosyan, A, Schovancova, J, Vaniachine, A, Wenaus, T
Lenguaje:eng
Publicado: 2014
Materias:
Acceso en línea:http://cds.cern.ch/record/1670021
Descripción
Sumario:The LHC experiments are today at the leading edge of large scale distributed data-intensive computational science. The LHC's ATLAS experiment processes data volumes which are particularly extreme, over 140 PB to date, distributed worldwide at over of 120 sites. An important element in the success of the exciting physics results from ATLAS is the highly scalable integrated workflow and dataflow management afforded by the PanDA workload management system, used for all the distributed computing needs of the experiment. The PanDA design is not experiment specific and PanDA is now being extended to support other data intensive scientific applications. PanDA was cited as an example of "a high performance, fault tolerant software for fast, scalable access to data repositories of many kinds" during the "Big Data Research and Development Initiative" announcement, a 200 million USD U.S. government investment in tools to handle huge volumes of digital data needed to spur science and engineering discoveries. In this talk, a description of the new program of work to develop a generic version of PanDA will be given, as well as the progress in extending PanDA's capabilities to support supercomputers, clouds, leverage intelligent networking, while accommodating the ever growing needs of current users. In particular we will present our plans to refactor PanDA and to develop VO neutral WMS package to be used by new experiments, such as LSST and LBNE, as well as running LHC experiments. PanDA has already demonstrated at a very large scale the value of automated data-aware dynamic brokering of diverse workloads across distributed computing resources. The next generation of PanDA will allow many data-intensive sciences employing a variety of computing platforms to benefit from ATLAS' experience and proven tools in highly scalable processing.