Cargando…

Predicting data popularity using Bayesian networks over ATLAS grid sites

One of the primary tasks in resource utilization of the Distributed Computing at the ATLAS experiment is to replicate newly obtained data from the Production and Distributed Analysis System (PanDA) over grid while minimizing the number of data replicas, but on the other hand if a dataset becomes pop...

Descripción completa

Detalles Bibliográficos
Autores principales: Titov, M, Záruba, G, Klimentov, A, De, K
Lenguaje:eng
Publicado: 2012
Materias:
Acceso en línea:http://cds.cern.ch/record/1462239
_version_ 1780925299571032064
author Titov, M
Záruba, G
Klimentov, A
De, K
author_facet Titov, M
Záruba, G
Klimentov, A
De, K
author_sort Titov, M
collection CERN
description One of the primary tasks in resource utilization of the Distributed Computing at the ATLAS experiment is to replicate newly obtained data from the Production and Distributed Analysis System (PanDA) over grid while minimizing the number of data replicas, but on the other hand if a dataset becomes popular, such replicas should be encouraged to distribute the workload more evenly. To make this feasible it is of significant importance to know, with a good probability, how popular particular datasets will be in future. We are focusing on the analysis of data usage in PanDA system that provides efficient and transparent utilization of the grid for production and analysis tasks. The initial data popularity analysis was done at “A Probabilistic Analysis of Data Popularity in ATLAS Data Caching”, and an idea of Bayesian networks (a high-level representation of a probability distribution over a set of stochastic variables that are used for building a model of the problem domain) for popularity prediction has surfaced with corresponding conditioning parameters that were defined based on the data usage analysis.
id cern-1462239
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2012
record_format invenio
spelling cern-14622392019-09-30T06:29:59Zhttp://cds.cern.ch/record/1462239engTitov, MZáruba, GKlimentov, ADe, KPredicting data popularity using Bayesian networks over ATLAS grid sitesDetectors and Experimental TechniquesOne of the primary tasks in resource utilization of the Distributed Computing at the ATLAS experiment is to replicate newly obtained data from the Production and Distributed Analysis System (PanDA) over grid while minimizing the number of data replicas, but on the other hand if a dataset becomes popular, such replicas should be encouraged to distribute the workload more evenly. To make this feasible it is of significant importance to know, with a good probability, how popular particular datasets will be in future. We are focusing on the analysis of data usage in PanDA system that provides efficient and transparent utilization of the grid for production and analysis tasks. The initial data popularity analysis was done at “A Probabilistic Analysis of Data Popularity in ATLAS Data Caching”, and an idea of Bayesian networks (a high-level representation of a probability distribution over a set of stochastic variables that are used for building a model of the problem domain) for popularity prediction has surfaced with corresponding conditioning parameters that were defined based on the data usage analysis.ATL-SOFT-SLIDE-2012-444oai:cds.cern.ch:14622392012-07-17
spellingShingle Detectors and Experimental Techniques
Titov, M
Záruba, G
Klimentov, A
De, K
Predicting data popularity using Bayesian networks over ATLAS grid sites
title Predicting data popularity using Bayesian networks over ATLAS grid sites
title_full Predicting data popularity using Bayesian networks over ATLAS grid sites
title_fullStr Predicting data popularity using Bayesian networks over ATLAS grid sites
title_full_unstemmed Predicting data popularity using Bayesian networks over ATLAS grid sites
title_short Predicting data popularity using Bayesian networks over ATLAS grid sites
title_sort predicting data popularity using bayesian networks over atlas grid sites
topic Detectors and Experimental Techniques
url http://cds.cern.ch/record/1462239
work_keys_str_mv AT titovm predictingdatapopularityusingbayesiannetworksoveratlasgridsites
AT zarubag predictingdatapopularityusingbayesiannetworksoveratlasgridsites
AT klimentova predictingdatapopularityusingbayesiannetworksoveratlasgridsites
AT dek predictingdatapopularityusingbayesiannetworksoveratlasgridsites