Cargando…

An Application of Visual Analytics Methods to Cluster and Categorize Data Processing Jobs in High Energy and Nuclear Physics Experiments

Hundreds of petabytes of experimental data in high energy and nuclear physics (HENP) have been collected by unique scientific facilities, such as LHC, RHIC and KEK. As the accelerators are being upgraded with increased energy and luminosity, data volumes are rapidly growing and have reached the exab...

Descripción completa

Detalles Bibliográficos
Autor principal: The ATLAS collaboration
Lenguaje:eng
Publicado: 2020
Materias:
Acceso en línea:http://cds.cern.ch/record/2742496
_version_ 1780968512269844480
author The ATLAS collaboration
author_facet The ATLAS collaboration
author_sort The ATLAS collaboration
collection CERN
description Hundreds of petabytes of experimental data in high energy and nuclear physics (HENP) have been collected by unique scientific facilities, such as LHC, RHIC and KEK. As the accelerators are being upgraded with increased energy and luminosity, data volumes are rapidly growing and have reached the exabyte scale. This leads to an increase in the number of data processing and analysis tasks, continuously competing for computational resources. The growing number of processing tasks requires an increase in the capacity of the computing infrastructure that can only be achieved through the use of high-performance computing resources. Along with the grid, these resources form a heterogeneous distributed computing environment (hundreds of distributed computing centers). Given a distributed model of data processing and analysis, the optimization of data and distributed data processing systems becomes a critical task, and the absence of an adequate solution for this task leads to economic, functional and time losses. This paper describes the first stage of a study which aims to solve the task of increasing the stability and efficiency of workflow management systems for mega-science experiments by applying visual analytics methods - data analysis leveraging an interactive GUI. That would allow to create specialized guidelines for pre-processing and brokering of computing jobs. Currently visual analytics methods are widely used in various domains of data analysis, including scientific research, engineering, management, financial monitoring and information security. Using data analysis tools that support data visualization, the information can be analyzed by an individual who is well-informed about the object of investigation, but who is not necessary aware of the internal structure of the data models. Furthermore, visual analytics simplify the navigation through data analysis results: the data is represented by graphical objects, which can be manipulated either by mouse or using touch-sensitive screens. In this case human spatial thinking is actively used to identify new tendencies and patterns in the collected data, without having the users to struggle with underlying software. In this paper we demonstrate visual methods of clustering computing tasks of the distributed data processing system using the ATLAS experiment at the LHC as an example. The interdependencies and correlations between various tasks or job parameters are investigated and graphically interpreted in an n-dimensional space using 3D projections. The visual analysis allows us to group together similar jobs, identify anomalous jobs, and determine the cause of such anomalies.
id cern-2742496
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2020
record_format invenio
spelling cern-27424962021-04-19T09:06:23Zhttp://cds.cern.ch/record/2742496engThe ATLAS collaborationAn Application of Visual Analytics Methods to Cluster and Categorize Data Processing Jobs in High Energy and Nuclear Physics ExperimentsParticle Physics - ExperimentHundreds of petabytes of experimental data in high energy and nuclear physics (HENP) have been collected by unique scientific facilities, such as LHC, RHIC and KEK. As the accelerators are being upgraded with increased energy and luminosity, data volumes are rapidly growing and have reached the exabyte scale. This leads to an increase in the number of data processing and analysis tasks, continuously competing for computational resources. The growing number of processing tasks requires an increase in the capacity of the computing infrastructure that can only be achieved through the use of high-performance computing resources. Along with the grid, these resources form a heterogeneous distributed computing environment (hundreds of distributed computing centers). Given a distributed model of data processing and analysis, the optimization of data and distributed data processing systems becomes a critical task, and the absence of an adequate solution for this task leads to economic, functional and time losses. This paper describes the first stage of a study which aims to solve the task of increasing the stability and efficiency of workflow management systems for mega-science experiments by applying visual analytics methods - data analysis leveraging an interactive GUI. That would allow to create specialized guidelines for pre-processing and brokering of computing jobs. Currently visual analytics methods are widely used in various domains of data analysis, including scientific research, engineering, management, financial monitoring and information security. Using data analysis tools that support data visualization, the information can be analyzed by an individual who is well-informed about the object of investigation, but who is not necessary aware of the internal structure of the data models. Furthermore, visual analytics simplify the navigation through data analysis results: the data is represented by graphical objects, which can be manipulated either by mouse or using touch-sensitive screens. In this case human spatial thinking is actively used to identify new tendencies and patterns in the collected data, without having the users to struggle with underlying software. In this paper we demonstrate visual methods of clustering computing tasks of the distributed data processing system using the ATLAS experiment at the LHC as an example. The interdependencies and correlations between various tasks or job parameters are investigated and graphically interpreted in an n-dimensional space using 3D projections. The visual analysis allows us to group together similar jobs, identify anomalous jobs, and determine the cause of such anomalies.ATL-SOFT-PUB-2020-005oai:cds.cern.ch:27424962020-10-22
spellingShingle Particle Physics - Experiment
The ATLAS collaboration
An Application of Visual Analytics Methods to Cluster and Categorize Data Processing Jobs in High Energy and Nuclear Physics Experiments
title An Application of Visual Analytics Methods to Cluster and Categorize Data Processing Jobs in High Energy and Nuclear Physics Experiments
title_full An Application of Visual Analytics Methods to Cluster and Categorize Data Processing Jobs in High Energy and Nuclear Physics Experiments
title_fullStr An Application of Visual Analytics Methods to Cluster and Categorize Data Processing Jobs in High Energy and Nuclear Physics Experiments
title_full_unstemmed An Application of Visual Analytics Methods to Cluster and Categorize Data Processing Jobs in High Energy and Nuclear Physics Experiments
title_short An Application of Visual Analytics Methods to Cluster and Categorize Data Processing Jobs in High Energy and Nuclear Physics Experiments
title_sort application of visual analytics methods to cluster and categorize data processing jobs in high energy and nuclear physics experiments
topic Particle Physics - Experiment
url http://cds.cern.ch/record/2742496
work_keys_str_mv AT theatlascollaboration anapplicationofvisualanalyticsmethodstoclusterandcategorizedataprocessingjobsinhighenergyandnuclearphysicsexperiments
AT theatlascollaboration applicationofvisualanalyticsmethodstoclusterandcategorizedataprocessingjobsinhighenergyandnuclearphysicsexperiments