Cargando…
Apache Flink: Distributed Stream Data Processing
The amount of data is growing significantly over the past few years. Therefore, the need for distributed data processing frameworks is growing. Currently, there are two well-known data processing frameworks with an API for data batches and an API for data streams which are named Apache Flink and Apa...
Autores principales: | , |
---|---|
Lenguaje: | eng |
Publicado: |
2016
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/2208322 |
_version_ | 1780951737164627968 |
---|---|
author | Jacobs, Kevin Surdy, Kacper |
author_facet | Jacobs, Kevin Surdy, Kacper |
author_sort | Jacobs, Kevin |
collection | CERN |
description | The amount of data is growing significantly over the past few years. Therefore, the need for distributed data processing frameworks is growing. Currently, there are two well-known data processing frameworks with an API for data batches and an API for data streams which are named Apache Flink and Apache Spark. Both Apache Spark and Apache Flink are improving upon the MapReduce implementation of the Apache Hadoop framework. MapReduce is the first programming model for distributed processing on large scale that is available in Apache Hadoop. This report compares the Stream API and the Batch API for both frameworks. |
id | cern-2208322 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2016 |
record_format | invenio |
spelling | cern-22083222019-09-30T06:29:59Zhttp://cds.cern.ch/record/2208322engJacobs, KevinSurdy, KacperApache Flink: Distributed Stream Data ProcessingComputing and ComputersThe amount of data is growing significantly over the past few years. Therefore, the need for distributed data processing frameworks is growing. Currently, there are two well-known data processing frameworks with an API for data batches and an API for data streams which are named Apache Flink and Apache Spark. Both Apache Spark and Apache Flink are improving upon the MapReduce implementation of the Apache Hadoop framework. MapReduce is the first programming model for distributed processing on large scale that is available in Apache Hadoop. This report compares the Stream API and the Batch API for both frameworks.CERN-IT-Note-2016-006oai:cds.cern.ch:22083222016-09-16 |
spellingShingle | Computing and Computers Jacobs, Kevin Surdy, Kacper Apache Flink: Distributed Stream Data Processing |
title | Apache Flink: Distributed Stream Data Processing |
title_full | Apache Flink: Distributed Stream Data Processing |
title_fullStr | Apache Flink: Distributed Stream Data Processing |
title_full_unstemmed | Apache Flink: Distributed Stream Data Processing |
title_short | Apache Flink: Distributed Stream Data Processing |
title_sort | apache flink: distributed stream data processing |
topic | Computing and Computers |
url | http://cds.cern.ch/record/2208322 |
work_keys_str_mv | AT jacobskevin apacheflinkdistributedstreamdataprocessing AT surdykacper apacheflinkdistributedstreamdataprocessing |