Cargando…
Big Data Technologies and Physics Analysis with Apache Spark (lecture 1)
<!--HTML-->The Large Hadron Collider is scheduled to shut down for a 2 years maintenance period since December 2018. However, the already collected data -which are stored in a dedicated custom storage service- between April 2015 and November 2018, exceed 150 PBs in total. To analyse these data...
Autor principal: | |
---|---|
Lenguaje: | eng |
Publicado: |
2019
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/2666394 |
_version_ | 1780961997448282112 |
---|---|
author | Motesnitsalis, Evangelos |
author_facet | Motesnitsalis, Evangelos |
author_sort | Motesnitsalis, Evangelos |
collection | CERN |
description | <!--HTML-->The Large Hadron Collider is scheduled to shut down for a 2 years maintenance period since December 2018. However, the already collected data -which are stored in a dedicated custom storage service- between April 2015 and November 2018, exceed 150 PBs in total. To analyse these data, more and more teams at CERN decide to use Big Data Technologies to perform Physics Analysis and "Data Reduction", i.e. produce smaller reusable datasets for frequent access. These technologies show great potential in speeding up the existing procedures.
This lecture will provide an overview of the latest trending big data technologies in the Hadoop and Spark ecosystems with focus on their main architecture characteristics, and then will target a number of important questions: How can we perform Physics Analysis with Big Data Technologies? What are the problems faced? What are the challenges and the available data sources? What are the other domain in which Big Data Analytics are applied at CERN? |
id | cern-2666394 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2019 |
record_format | invenio |
spelling | cern-26663942022-11-02T22:32:37Zhttp://cds.cern.ch/record/2666394engMotesnitsalis, EvangelosBig Data Technologies and Physics Analysis with Apache Spark (lecture 1)Inverted CERN School of Computing 2019Inverted CSC<!--HTML-->The Large Hadron Collider is scheduled to shut down for a 2 years maintenance period since December 2018. However, the already collected data -which are stored in a dedicated custom storage service- between April 2015 and November 2018, exceed 150 PBs in total. To analyse these data, more and more teams at CERN decide to use Big Data Technologies to perform Physics Analysis and "Data Reduction", i.e. produce smaller reusable datasets for frequent access. These technologies show great potential in speeding up the existing procedures. This lecture will provide an overview of the latest trending big data technologies in the Hadoop and Spark ecosystems with focus on their main architecture characteristics, and then will target a number of important questions: How can we perform Physics Analysis with Big Data Technologies? What are the problems faced? What are the challenges and the available data sources? What are the other domain in which Big Data Analytics are applied at CERN?oai:cds.cern.ch:26663942019 |
spellingShingle | Inverted CSC Motesnitsalis, Evangelos Big Data Technologies and Physics Analysis with Apache Spark (lecture 1) |
title | Big Data Technologies and Physics Analysis with Apache Spark (lecture 1) |
title_full | Big Data Technologies and Physics Analysis with Apache Spark (lecture 1) |
title_fullStr | Big Data Technologies and Physics Analysis with Apache Spark (lecture 1) |
title_full_unstemmed | Big Data Technologies and Physics Analysis with Apache Spark (lecture 1) |
title_short | Big Data Technologies and Physics Analysis with Apache Spark (lecture 1) |
title_sort | big data technologies and physics analysis with apache spark (lecture 1) |
topic | Inverted CSC |
url | http://cds.cern.ch/record/2666394 |
work_keys_str_mv | AT motesnitsalisevangelos bigdatatechnologiesandphysicsanalysiswithapachesparklecture1 AT motesnitsalisevangelos invertedcernschoolofcomputing2019 |