Cargando…

CMS users data management service integration and first experiences with its NoSQL data storage

The distributed data analysis workflow in CMS assumes that jobs run in a different location to where their results are finally stored. Typically the user outputs must be transferred from one site to another by a dedicated CMS service, AsyncStageOut. This new service is originally developed to addres...

Descripción completa

Detalles Bibliográficos
Autores principales: Riahi, H, Ciangottini, D, Cinquilli, M, Hernandez, J M, Konstantinov, P, Mascheroni, M, Santocchia, A
Lenguaje:eng
Publicado: 2013
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1742-6596/513/3/032079
http://cds.cern.ch/record/1623288
_version_ 1780933373534928896
author Riahi, H
Ciangottini, D
Cinquilli, M
Hernandez, J M
Konstantinov, P
Mascheroni, M
Santocchia, A
author_facet Riahi, H
Ciangottini, D
Cinquilli, M
Hernandez, J M
Konstantinov, P
Mascheroni, M
Santocchia, A
author_sort Riahi, H
collection CERN
description The distributed data analysis workflow in CMS assumes that jobs run in a different location to where their results are finally stored. Typically the user outputs must be transferred from one site to another by a dedicated CMS service, AsyncStageOut. This new service is originally developed to address the inefficiency in using the CMS computing resources when transferring the analysis job outputs, synchronously, once they are produced in the job execution node to the remote site.The AsyncStageOut is designed as a thin application relying only on the NoSQL database (CouchDB) as input and data storage. It has progressed from a limited prototype to a highly adaptable service which manages and monitors the whole user files steps, namely file transfer and publication. The AsyncStageOut is integrated with the Common CMS/Atlas Analysis Framework. It foresees the management of nearly 200k users files per day of close to 1000 individual users per month with minimal delays, and providing a real time monitoring and reports to users and service operators, while being highly available. The associated data volume represents a new set of challenges in the areas of database scalability and service performance and efficiency. In this paper, we present an overview of the AsyncStageOut model and the integration strategy with the Common Analysis Framework. The motivations for using the NoSQL technology are also presented, as well as data design and the techniques used for efficient indexing and monitoring of the data. We describe deployment model for the high availability and scalability of the service. We also discuss the hardware requirements and the results achieved as they were determined by testing with actual data and realistic loads during the commissioning and the initial production phase with the Common Analysis Framework.
id cern-1623288
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2013
record_format invenio
spelling cern-16232882019-09-30T06:29:59Zdoi:10.1088/1742-6596/513/3/032079http://cds.cern.ch/record/1623288engRiahi, HCiangottini, DCinquilli, MHernandez, J MKonstantinov, PMascheroni, MSantocchia, ACMS users data management service integration and first experiences with its NoSQL data storageDetectors and Experimental TechniquesThe distributed data analysis workflow in CMS assumes that jobs run in a different location to where their results are finally stored. Typically the user outputs must be transferred from one site to another by a dedicated CMS service, AsyncStageOut. This new service is originally developed to address the inefficiency in using the CMS computing resources when transferring the analysis job outputs, synchronously, once they are produced in the job execution node to the remote site.The AsyncStageOut is designed as a thin application relying only on the NoSQL database (CouchDB) as input and data storage. It has progressed from a limited prototype to a highly adaptable service which manages and monitors the whole user files steps, namely file transfer and publication. The AsyncStageOut is integrated with the Common CMS/Atlas Analysis Framework. It foresees the management of nearly 200k users files per day of close to 1000 individual users per month with minimal delays, and providing a real time monitoring and reports to users and service operators, while being highly available. The associated data volume represents a new set of challenges in the areas of database scalability and service performance and efficiency. In this paper, we present an overview of the AsyncStageOut model and the integration strategy with the Common Analysis Framework. The motivations for using the NoSQL technology are also presented, as well as data design and the techniques used for efficient indexing and monitoring of the data. We describe deployment model for the high availability and scalability of the service. We also discuss the hardware requirements and the results achieved as they were determined by testing with actual data and realistic loads during the commissioning and the initial production phase with the Common Analysis Framework.CMS-CR-2013-371oai:cds.cern.ch:16232882013-10-29
spellingShingle Detectors and Experimental Techniques
Riahi, H
Ciangottini, D
Cinquilli, M
Hernandez, J M
Konstantinov, P
Mascheroni, M
Santocchia, A
CMS users data management service integration and first experiences with its NoSQL data storage
title CMS users data management service integration and first experiences with its NoSQL data storage
title_full CMS users data management service integration and first experiences with its NoSQL data storage
title_fullStr CMS users data management service integration and first experiences with its NoSQL data storage
title_full_unstemmed CMS users data management service integration and first experiences with its NoSQL data storage
title_short CMS users data management service integration and first experiences with its NoSQL data storage
title_sort cms users data management service integration and first experiences with its nosql data storage
topic Detectors and Experimental Techniques
url https://dx.doi.org/10.1088/1742-6596/513/3/032079
http://cds.cern.ch/record/1623288
work_keys_str_mv AT riahih cmsusersdatamanagementserviceintegrationandfirstexperienceswithitsnosqldatastorage
AT ciangottinid cmsusersdatamanagementserviceintegrationandfirstexperienceswithitsnosqldatastorage
AT cinquillim cmsusersdatamanagementserviceintegrationandfirstexperienceswithitsnosqldatastorage
AT hernandezjm cmsusersdatamanagementserviceintegrationandfirstexperienceswithitsnosqldatastorage
AT konstantinovp cmsusersdatamanagementserviceintegrationandfirstexperienceswithitsnosqldatastorage
AT mascheronim cmsusersdatamanagementserviceintegrationandfirstexperienceswithitsnosqldatastorage
AT santocchiaa cmsusersdatamanagementserviceintegrationandfirstexperienceswithitsnosqldatastorage