Cargando…

Apache Flink: Distributed Stream Data Processing

The amount of data is growing significantly over the past few years. Therefore, the need for distributed data processing frameworks is growing. Currently, there are two well-known data processing frameworks with an API for data batches and an API for data streams which are named Apache Flink and Apa...

Descripción completa

Detalles Bibliográficos
Autores principales: Jacobs, Kevin, Surdy, Kacper
Lenguaje:eng
Publicado: 2016
Materias:
Acceso en línea:http://cds.cern.ch/record/2208322
_version_ 1780951737164627968
author Jacobs, Kevin
Surdy, Kacper
author_facet Jacobs, Kevin
Surdy, Kacper
author_sort Jacobs, Kevin
collection CERN
description The amount of data is growing significantly over the past few years. Therefore, the need for distributed data processing frameworks is growing. Currently, there are two well-known data processing frameworks with an API for data batches and an API for data streams which are named Apache Flink and Apache Spark. Both Apache Spark and Apache Flink are improving upon the MapReduce implementation of the Apache Hadoop framework. MapReduce is the first programming model for distributed processing on large scale that is available in Apache Hadoop. This report compares the Stream API and the Batch API for both frameworks.
id cern-2208322
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2016
record_format invenio
spelling cern-22083222019-09-30T06:29:59Zhttp://cds.cern.ch/record/2208322engJacobs, KevinSurdy, KacperApache Flink: Distributed Stream Data ProcessingComputing and ComputersThe amount of data is growing significantly over the past few years. Therefore, the need for distributed data processing frameworks is growing. Currently, there are two well-known data processing frameworks with an API for data batches and an API for data streams which are named Apache Flink and Apache Spark. Both Apache Spark and Apache Flink are improving upon the MapReduce implementation of the Apache Hadoop framework. MapReduce is the first programming model for distributed processing on large scale that is available in Apache Hadoop. This report compares the Stream API and the Batch API for both frameworks.CERN-IT-Note-2016-006oai:cds.cern.ch:22083222016-09-16
spellingShingle Computing and Computers
Jacobs, Kevin
Surdy, Kacper
Apache Flink: Distributed Stream Data Processing
title Apache Flink: Distributed Stream Data Processing
title_full Apache Flink: Distributed Stream Data Processing
title_fullStr Apache Flink: Distributed Stream Data Processing
title_full_unstemmed Apache Flink: Distributed Stream Data Processing
title_short Apache Flink: Distributed Stream Data Processing
title_sort apache flink: distributed stream data processing
topic Computing and Computers
url http://cds.cern.ch/record/2208322
work_keys_str_mv AT jacobskevin apacheflinkdistributedstreamdataprocessing
AT surdykacper apacheflinkdistributedstreamdataprocessing