Cargando…
Spark - a modern approach for distributed analytics
<!--HTML--><p>The <strong>Hadoop</strong> ecosystem is the leading opensource platform for distributed storing and processing big data. It is a very popular system for implementing data warehouses and data lakes. <strong>Spark </strong>has also emerged to be one o...
Autores principales: | , |
---|---|
Lenguaje: | eng |
Publicado: |
2016
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/2214510 |
_version_ | 1780951976627929088 |
---|---|
author | Surdy, Kacper Kothuri, Prasanth |
author_facet | Surdy, Kacper Kothuri, Prasanth |
author_sort | Surdy, Kacper |
collection | CERN |
description | <!--HTML--><p>The <strong>Hadoop</strong> ecosystem is the leading opensource platform for distributed storing and processing big data. It is a very popular system for implementing data warehouses and data lakes. <strong>Spark </strong>has also emerged to be one of the leading engines for data analytics. The Hadoop platform is available at CERN as a central service provided by the IT department.</p>
<p>By attending the session, a participant will acquire knowledge of the essential <strong>concepts </strong>need to benefit from the<strong> parallel data processing </strong>offered by Spark<strong> </strong>framework. The session is structured around practical <strong>examples </strong>and tutorials.</p>
<p>Main topics:</p>
<ul>
<li><strong>Architecture </strong>overview - work distribution, concepts of a worker and a driver</li>
<li>Computing concepts of <strong>transformations </strong>and <strong>actions</strong></li>
<li>Data processing APIs - <strong>RDD, DataFrame, </strong>and <strong>SparkSQL</strong></li>
</ul> |
id | cern-2214510 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2016 |
record_format | invenio |
spelling | cern-22145102022-11-02T22:18:48Zhttp://cds.cern.ch/record/2214510engSurdy, KacperKothuri, PrasanthSpark - a modern approach for distributed analyticsSpark - a modern approach for distributed analyticsWorkshops<!--HTML--><p>The <strong>Hadoop</strong> ecosystem is the leading opensource platform for distributed storing and processing big data. It is a very popular system for implementing data warehouses and data lakes. <strong>Spark </strong>has also emerged to be one of the leading engines for data analytics. The Hadoop platform is available at CERN as a central service provided by the IT department.</p> <p>By attending the session, a participant will acquire knowledge of the essential <strong>concepts </strong>need to benefit from the<strong> parallel data processing </strong>offered by Spark<strong> </strong>framework. The session is structured around practical <strong>examples </strong>and tutorials.</p> <p>Main topics:</p> <ul> <li><strong>Architecture </strong>overview - work distribution, concepts of a worker and a driver</li> <li>Computing concepts of <strong>transformations </strong>and <strong>actions</strong></li> <li>Data processing APIs - <strong>RDD, DataFrame, </strong>and <strong>SparkSQL</strong></li> </ul>oai:cds.cern.ch:22145102016 |
spellingShingle | Workshops Surdy, Kacper Kothuri, Prasanth Spark - a modern approach for distributed analytics |
title | Spark - a modern approach for distributed analytics |
title_full | Spark - a modern approach for distributed analytics |
title_fullStr | Spark - a modern approach for distributed analytics |
title_full_unstemmed | Spark - a modern approach for distributed analytics |
title_short | Spark - a modern approach for distributed analytics |
title_sort | spark - a modern approach for distributed analytics |
topic | Workshops |
url | http://cds.cern.ch/record/2214510 |
work_keys_str_mv | AT surdykacper sparkamodernapproachfordistributedanalytics AT kothuriprasanth sparkamodernapproachfordistributedanalytics |