Cargando…

Apache Flink: Distributed Stream Data Processing

The amount of data is growing significantly over the past few years. Therefore, the need for distributed data processing frameworks is growing. Currently, there are two well-known data processing frameworks with an API for data batches and an API for data streams which are named Apache Flink and Apa...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jacobs, Kevin, Surdy, Kacper
Lenguaje:	eng
Publicado:	2016
Materias:	Computing and Computers
Acceso en línea:	http://cds.cern.ch/record/2208322

_version_	1780951737164627968
author	Jacobs, Kevin Surdy, Kacper
author_facet	Jacobs, Kevin Surdy, Kacper
author_sort	Jacobs, Kevin
collection	CERN
description	The amount of data is growing significantly over the past few years. Therefore, the need for distributed data processing frameworks is growing. Currently, there are two well-known data processing frameworks with an API for data batches and an API for data streams which are named Apache Flink and Apache Spark. Both Apache Spark and Apache Flink are improving upon the MapReduce implementation of the Apache Hadoop framework. MapReduce is the first programming model for distributed processing on large scale that is available in Apache Hadoop. This report compares the Stream API and the Batch API for both frameworks.
id	cern-2208322
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2016
record_format	invenio
spelling	cern-22083222019-09-30T06:29:59Zhttp://cds.cern.ch/record/2208322engJacobs, KevinSurdy, KacperApache Flink: Distributed Stream Data ProcessingComputing and ComputersThe amount of data is growing significantly over the past few years. Therefore, the need for distributed data processing frameworks is growing. Currently, there are two well-known data processing frameworks with an API for data batches and an API for data streams which are named Apache Flink and Apache Spark. Both Apache Spark and Apache Flink are improving upon the MapReduce implementation of the Apache Hadoop framework. MapReduce is the first programming model for distributed processing on large scale that is available in Apache Hadoop. This report compares the Stream API and the Batch API for both frameworks.CERN-IT-Note-2016-006oai:cds.cern.ch:22083222016-09-16
spellingShingle	Computing and Computers Jacobs, Kevin Surdy, Kacper Apache Flink: Distributed Stream Data Processing
title	Apache Flink: Distributed Stream Data Processing
title_full	Apache Flink: Distributed Stream Data Processing
title_fullStr	Apache Flink: Distributed Stream Data Processing
title_full_unstemmed	Apache Flink: Distributed Stream Data Processing
title_short	Apache Flink: Distributed Stream Data Processing
title_sort	apache flink: distributed stream data processing
topic	Computing and Computers
url	http://cds.cern.ch/record/2208322
work_keys_str_mv	AT jacobskevin apacheflinkdistributedstreamdataprocessing AT surdykacper apacheflinkdistributedstreamdataprocessing

Apache Flink: Distributed Stream Data Processing

Ejemplares similares