Cargando…

Big Data Technologies and Physics Analysis with Apache Spark (lecture 1)

<!--HTML-->The Large Hadron Collider is scheduled to shut down for a 2 years maintenance period since December 2018. However, the already collected data -which are stored in a dedicated custom storage service- between April 2015 and November 2018, exceed 150 PBs in total. To analyse these data...

Descripción completa

Detalles Bibliográficos
Autor principal: Motesnitsalis, Evangelos
Lenguaje:eng
Publicado: 2019
Materias:
Acceso en línea:http://cds.cern.ch/record/2666394
_version_ 1780961997448282112
author Motesnitsalis, Evangelos
author_facet Motesnitsalis, Evangelos
author_sort Motesnitsalis, Evangelos
collection CERN
description <!--HTML-->The Large Hadron Collider is scheduled to shut down for a 2 years maintenance period since December 2018. However, the already collected data -which are stored in a dedicated custom storage service- between April 2015 and November 2018, exceed 150 PBs in total. To analyse these data, more and more teams at CERN decide to use Big Data Technologies to perform Physics Analysis and "Data Reduction", i.e. produce smaller reusable datasets for frequent access. These technologies show great potential in speeding up the existing procedures. This lecture will provide an overview of the latest trending big data technologies in the Hadoop and Spark ecosystems with focus on their main architecture characteristics, and then will target a number of important questions: How can we perform Physics Analysis with Big Data Technologies? What are the problems faced? What are the challenges and the available data sources? What are the other domain in which Big Data Analytics are applied at CERN?
id cern-2666394
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2019
record_format invenio
spelling cern-26663942022-11-02T22:32:37Zhttp://cds.cern.ch/record/2666394engMotesnitsalis, EvangelosBig Data Technologies and Physics Analysis with Apache Spark (lecture 1)Inverted CERN School of Computing 2019Inverted CSC<!--HTML-->The Large Hadron Collider is scheduled to shut down for a 2 years maintenance period since December 2018. However, the already collected data -which are stored in a dedicated custom storage service- between April 2015 and November 2018, exceed 150 PBs in total. To analyse these data, more and more teams at CERN decide to use Big Data Technologies to perform Physics Analysis and "Data Reduction", i.e. produce smaller reusable datasets for frequent access. These technologies show great potential in speeding up the existing procedures. This lecture will provide an overview of the latest trending big data technologies in the Hadoop and Spark ecosystems with focus on their main architecture characteristics, and then will target a number of important questions: How can we perform Physics Analysis with Big Data Technologies? What are the problems faced? What are the challenges and the available data sources? What are the other domain in which Big Data Analytics are applied at CERN?oai:cds.cern.ch:26663942019
spellingShingle Inverted CSC
Motesnitsalis, Evangelos
Big Data Technologies and Physics Analysis with Apache Spark (lecture 1)
title Big Data Technologies and Physics Analysis with Apache Spark (lecture 1)
title_full Big Data Technologies and Physics Analysis with Apache Spark (lecture 1)
title_fullStr Big Data Technologies and Physics Analysis with Apache Spark (lecture 1)
title_full_unstemmed Big Data Technologies and Physics Analysis with Apache Spark (lecture 1)
title_short Big Data Technologies and Physics Analysis with Apache Spark (lecture 1)
title_sort big data technologies and physics analysis with apache spark (lecture 1)
topic Inverted CSC
url http://cds.cern.ch/record/2666394
work_keys_str_mv AT motesnitsalisevangelos bigdatatechnologiesandphysicsanalysiswithapachesparklecture1
AT motesnitsalisevangelos invertedcernschoolofcomputing2019