Cargando…
LHCb Data Management: consistency, integrity and coherence of data
The Large Hadron Collider (LHC) at CERN will start operating in 2007. The LHCb experiment is preparing for the real data handling and analysis via a series of data challenges and production exercises. The aim of these activities is to demonstrate the readiness of the computing infrastructure based o...
Autor principal: | |
---|---|
Lenguaje: | eng |
Publicado: |
2007
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/1120785 |
_version_ | 1780914564750114816 |
---|---|
author | Bargiotti, Marianne |
author_facet | Bargiotti, Marianne |
author_sort | Bargiotti, Marianne |
collection | CERN |
description | The Large Hadron Collider (LHC) at CERN will start operating in 2007. The LHCb experiment is preparing for the real data handling and analysis via a series of data challenges and production exercises. The aim of these activities is to demonstrate the readiness of the computing infrastructure based on WLCG (Worldwide LHC Computing Grid) technologies, to validate the computing model and to provide useful samples of data for detector and physics studies. DIRAC (Distributed Infrastructure with Remote Agent Control) is the gateway to WLCG. The Dirac Data Management System (DMS) relies on both WLCG Data Management services (LCG File Catalogues, Storage Resource Managers and File Transfer Service) and LHCb specific components (Bookkeeping Metadata File Catalogue). Although the Dirac DMS has been extensively used over the past years and has proved to achieve a high grade of maturity and reliability, the complexity of both the DMS and its interactions with numerous WLCG components as well as the instability of facilities concerned, turned frequently into unexpected problems in data moving and/or data registration. Such problems make it impossible at all times to have a coherent picture of experimental data-grid across various services involved. The LHCb policy on these issues has been addressed towards an investment in resources targeting the minimization of the number of occurrences involving data corruptions, data missing, data incoherence and inconsistencies among Catalogues and physical storages, both through safety measures at data management level (failover mechanisms, check sums, roll back mechanisms) and through expensive background checks. The data integrity and the consistency checks activity are presented here. The goal of this activity is to be able to maintain a consistent picture of the main catalogues (Bookkeeping and LFC) and the Storage Elements, primarily among them, and then with the computing model. While reducing actively the number of these interventions still represents the main goal of the DMS in LHCb, the outcome of these checks represents also a lucid evaluation of the quality of service offered by the underlying Grid infrastructure. The planned activity on data integrity, consistency and coherence in the Grid is addressed for the development, in a near future, of a generic tool suite able to categorize, analyze and systematically cure the disparate problems affecting data management. The advantages are: the efforts made to solve immediate problems can be embedded in more generic and higher level tools; and fixes to some problems can be applied to DIRAC as well to avoid repetitions of problems. |
id | cern-1120785 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2007 |
record_format | invenio |
spelling | cern-11207852019-09-30T06:29:59Zhttp://cds.cern.ch/record/1120785engBargiotti, MarianneLHCb Data Management: consistency, integrity and coherence of dataDetectors and Experimental TechniquesComputing and ComputersThe Large Hadron Collider (LHC) at CERN will start operating in 2007. The LHCb experiment is preparing for the real data handling and analysis via a series of data challenges and production exercises. The aim of these activities is to demonstrate the readiness of the computing infrastructure based on WLCG (Worldwide LHC Computing Grid) technologies, to validate the computing model and to provide useful samples of data for detector and physics studies. DIRAC (Distributed Infrastructure with Remote Agent Control) is the gateway to WLCG. The Dirac Data Management System (DMS) relies on both WLCG Data Management services (LCG File Catalogues, Storage Resource Managers and File Transfer Service) and LHCb specific components (Bookkeeping Metadata File Catalogue). Although the Dirac DMS has been extensively used over the past years and has proved to achieve a high grade of maturity and reliability, the complexity of both the DMS and its interactions with numerous WLCG components as well as the instability of facilities concerned, turned frequently into unexpected problems in data moving and/or data registration. Such problems make it impossible at all times to have a coherent picture of experimental data-grid across various services involved. The LHCb policy on these issues has been addressed towards an investment in resources targeting the minimization of the number of occurrences involving data corruptions, data missing, data incoherence and inconsistencies among Catalogues and physical storages, both through safety measures at data management level (failover mechanisms, check sums, roll back mechanisms) and through expensive background checks. The data integrity and the consistency checks activity are presented here. The goal of this activity is to be able to maintain a consistent picture of the main catalogues (Bookkeeping and LFC) and the Storage Elements, primarily among them, and then with the computing model. While reducing actively the number of these interventions still represents the main goal of the DMS in LHCb, the outcome of these checks represents also a lucid evaluation of the quality of service offered by the underlying Grid infrastructure. The planned activity on data integrity, consistency and coherence in the Grid is addressed for the development, in a near future, of a generic tool suite able to categorize, analyze and systematically cure the disparate problems affecting data management. The advantages are: the efforts made to solve immediate problems can be embedded in more generic and higher level tools; and fixes to some problems can be applied to DIRAC as well to avoid repetitions of problems.oai:cds.cern.ch:11207852007 |
spellingShingle | Detectors and Experimental Techniques Computing and Computers Bargiotti, Marianne LHCb Data Management: consistency, integrity and coherence of data |
title | LHCb Data Management: consistency, integrity and coherence of data |
title_full | LHCb Data Management: consistency, integrity and coherence of data |
title_fullStr | LHCb Data Management: consistency, integrity and coherence of data |
title_full_unstemmed | LHCb Data Management: consistency, integrity and coherence of data |
title_short | LHCb Data Management: consistency, integrity and coherence of data |
title_sort | lhcb data management: consistency, integrity and coherence of data |
topic | Detectors and Experimental Techniques Computing and Computers |
url | http://cds.cern.ch/record/1120785 |
work_keys_str_mv | AT bargiottimarianne lhcbdatamanagementconsistencyintegrityandcoherenceofdata |