Cargando…

A study of the applicability of recommender systems for the Production and Distributed Analysis system PanDA of the ATLAS Experiment

Scientific computing has advanced in a way of how to deal with massive amounts of data, since the production capacities have increased significantly for the last decades. Most large science experiments require vast computing and data storage resources in order to provide results or predictions based...

Descripción completa

Detalles Bibliográficos
Autores principales: Titov, Mikhail, Zaruba, Gergely, De, Kaushik, Klimentov, Alexei, Jha, S.
Lenguaje:eng
Publicado: 2017
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1742-6596/1085/4/042028
http://cds.cern.ch/record/2290160
_version_ 1780956290205351936
author Titov, Mikhail
Zaruba, Gergely
De, Kaushik
Klimentov, Alexei
Jha, S.
author_facet Titov, Mikhail
Zaruba, Gergely
De, Kaushik
Klimentov, Alexei
Jha, S.
author_sort Titov, Mikhail
collection CERN
description Scientific computing has advanced in a way of how to deal with massive amounts of data, since the production capacities have increased significantly for the last decades. Most large science experiments require vast computing and data storage resources in order to provide results or predictions based on the data obtained. For scientific distributed computing systems with hundreds of petabytes of data and thousands of users it is important to keep track not just of how data is distributed in the system, but also of individual user's interests in the distributed data (reveal implicit interconnection between user and data objects). This however requires the collection and use of specific statistics such as correlations between data distribution, the mechanics of data distribution, and mainly user preferences. This work focuses on user activities (specifically, data usages) and interests in such a distributed computing system, namely PanDA (Production ANd Distributed Analysis system). PanDA is a high-performance workload management system originally designed to meet production and analyses requirements for a data-driven workload at the Large Hadron Collider Computing Grid for the ATLAS Experiment hosted at CERN (the European Organization for Nuclear Research). In this work we are going to investigate whether data collection that was gathered in the past in PanDA shows any trends indicating that users could have mutual interests that would be kept for the next data usages (i.e., data usage patterns), with using data mining techniques such as association analysis, sequential pattern mining, and basics of the recommender system approach. We will show that such common interests between users indeed exist and thus could be used to provide recommendations (in terms of the collaborative filtering) to help users with their data selection process.
id cern-2290160
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2017
record_format invenio
spelling cern-22901602021-02-09T10:05:33Zdoi:10.1088/1742-6596/1085/4/042028http://cds.cern.ch/record/2290160engTitov, MikhailZaruba, GergelyDe, KaushikKlimentov, AlexeiJha, S.A study of the applicability of recommender systems for the Production and Distributed Analysis system PanDA of the ATLAS ExperimentParticle Physics - ExperimentScientific computing has advanced in a way of how to deal with massive amounts of data, since the production capacities have increased significantly for the last decades. Most large science experiments require vast computing and data storage resources in order to provide results or predictions based on the data obtained. For scientific distributed computing systems with hundreds of petabytes of data and thousands of users it is important to keep track not just of how data is distributed in the system, but also of individual user's interests in the distributed data (reveal implicit interconnection between user and data objects). This however requires the collection and use of specific statistics such as correlations between data distribution, the mechanics of data distribution, and mainly user preferences. This work focuses on user activities (specifically, data usages) and interests in such a distributed computing system, namely PanDA (Production ANd Distributed Analysis system). PanDA is a high-performance workload management system originally designed to meet production and analyses requirements for a data-driven workload at the Large Hadron Collider Computing Grid for the ATLAS Experiment hosted at CERN (the European Organization for Nuclear Research). In this work we are going to investigate whether data collection that was gathered in the past in PanDA shows any trends indicating that users could have mutual interests that would be kept for the next data usages (i.e., data usage patterns), with using data mining techniques such as association analysis, sequential pattern mining, and basics of the recommender system approach. We will show that such common interests between users indeed exist and thus could be used to provide recommendations (in terms of the collaborative filtering) to help users with their data selection process.ATL-SOFT-PROC-2017-060oai:cds.cern.ch:22901602017-10-22
spellingShingle Particle Physics - Experiment
Titov, Mikhail
Zaruba, Gergely
De, Kaushik
Klimentov, Alexei
Jha, S.
A study of the applicability of recommender systems for the Production and Distributed Analysis system PanDA of the ATLAS Experiment
title A study of the applicability of recommender systems for the Production and Distributed Analysis system PanDA of the ATLAS Experiment
title_full A study of the applicability of recommender systems for the Production and Distributed Analysis system PanDA of the ATLAS Experiment
title_fullStr A study of the applicability of recommender systems for the Production and Distributed Analysis system PanDA of the ATLAS Experiment
title_full_unstemmed A study of the applicability of recommender systems for the Production and Distributed Analysis system PanDA of the ATLAS Experiment
title_short A study of the applicability of recommender systems for the Production and Distributed Analysis system PanDA of the ATLAS Experiment
title_sort study of the applicability of recommender systems for the production and distributed analysis system panda of the atlas experiment
topic Particle Physics - Experiment
url https://dx.doi.org/10.1088/1742-6596/1085/4/042028
http://cds.cern.ch/record/2290160
work_keys_str_mv AT titovmikhail astudyoftheapplicabilityofrecommendersystemsfortheproductionanddistributedanalysissystempandaoftheatlasexperiment
AT zarubagergely astudyoftheapplicabilityofrecommendersystemsfortheproductionanddistributedanalysissystempandaoftheatlasexperiment
AT dekaushik astudyoftheapplicabilityofrecommendersystemsfortheproductionanddistributedanalysissystempandaoftheatlasexperiment
AT klimentovalexei astudyoftheapplicabilityofrecommendersystemsfortheproductionanddistributedanalysissystempandaoftheatlasexperiment
AT jhas astudyoftheapplicabilityofrecommendersystemsfortheproductionanddistributedanalysissystempandaoftheatlasexperiment
AT titovmikhail studyoftheapplicabilityofrecommendersystemsfortheproductionanddistributedanalysissystempandaoftheatlasexperiment
AT zarubagergely studyoftheapplicabilityofrecommendersystemsfortheproductionanddistributedanalysissystempandaoftheatlasexperiment
AT dekaushik studyoftheapplicabilityofrecommendersystemsfortheproductionanddistributedanalysissystempandaoftheatlasexperiment
AT klimentovalexei studyoftheapplicabilityofrecommendersystemsfortheproductionanddistributedanalysissystempandaoftheatlasexperiment
AT jhas studyoftheapplicabilityofrecommendersystemsfortheproductionanddistributedanalysissystempandaoftheatlasexperiment