Cargando…
Hadoop Tutorials - Hadoop Foundations
<!--HTML--><p>The <strong>Hadoop</strong> ecosystem is the leading opensource platform for distributed storage and processing of "big data". The Hadoop platform is available at CERN as a central service provided by the IT department.</p> <p>This...
Autores principales: | , |
---|---|
Lenguaje: | eng |
Publicado: |
2016
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/2197972 |
Sumario: | <!--HTML--><p>The <strong>Hadoop</strong> ecosystem is the leading opensource platform for distributed storage and processing of "big data". The Hadoop platform is available at CERN as a central service provided by the IT department.</p>
<p>This tutorial organized by the IT Hadoop service, aims to introduce the main concepts about Hadoop technology in a practical way and is targeted to those who would like to <strong>start using the service for distributed parallel data processing</strong>.</p>
<p>The main <strong>topics </strong>that will be covered are:</p>
<ul>
<li>Hadoop <strong>architecture </strong>and available components</li>
<li>How to perform distributed parallel processing in order to explore and create reports with SQL (with <strong>Apache Impala</strong>) on example data.</li>
<li>Using a HUE - <strong>Hadoop web UI</strong> for presenting the results in user friendly way.</li>
<li>How to format and/or structure data in order to make data processing more efficient - by using various data formats/containers and partitioning techniques (<strong>Avro, Parquet, HBase</strong>). Best practices in this area will be also discussed</li>
</ul>
<p> </p>
<p>Attendees will have the possibility to access a <strong>test Hadoop</strong> system where they will be able to perform hands-on exercises. Instructions will be provided by the speakers. To facilitate the preparation of the test environment, <strong>please register</strong> if you plan to attend.</p> |
---|