Cargando…

High performance Spark: best practices for scaling and optimizing Apache Spark

Apache Spark is amazing when everything clicks. But if you haven’t seen the performance improvements you expected, or still don’t feel confident enough to use Spark in production, this practical book is for you. Authors Holden Karau and Rachel Warren demonstrate performance optimizations to help you...

Descripción completa

Detalles Bibliográficos
Autores principales:	Karau, Holden, Warren, Rachel
Lenguaje:	eng
Publicado:	O'Reilly 2017
Materias:	Computing and Computers
Acceso en línea:	http://cds.cern.ch/record/2269084

_version_	1780954695243661312
author	Karau, Holden Warren, Rachel
author_facet	Karau, Holden Warren, Rachel
author_sort	Karau, Holden
collection	CERN
description	Apache Spark is amazing when everything clicks. But if you haven’t seen the performance improvements you expected, or still don’t feel confident enough to use Spark in production, this practical book is for you. Authors Holden Karau and Rachel Warren demonstrate performance optimizations to help your Spark queries run faster and handle larger data sizes, while using fewer resources. Ideal for software engineers, data engineers, developers, and system administrators working with large-scale data applications, this book describes techniques that can reduce data infrastructure costs and developer hours. Not only will you gain a more comprehensive understanding of Spark, you’ll also learn how to make it sing. With this book, you’ll explore: How Spark SQL’s new interfaces improve performance over SQL’s RDD data structure The choice between data joins in Core Spark and Spark SQL Techniques for getting the most out of standard RDD transformations How to work around performance issues in Spark’s key/value pair paradigm Writing high-performance Spark code without Scala or the JVM How to test for functionality and performance when applying suggested improvements Using Spark MLlib and Spark ML machine learning libraries Spark’s Streaming components and external community packages
id	cern-2269084
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2017
publisher	O'Reilly
record_format	invenio
spelling	cern-22690842021-04-21T19:11:20Zhttp://cds.cern.ch/record/2269084engKarau, HoldenWarren, RachelHigh performance Spark: best practices for scaling and optimizing Apache SparkComputing and ComputersApache Spark is amazing when everything clicks. But if you haven’t seen the performance improvements you expected, or still don’t feel confident enough to use Spark in production, this practical book is for you. Authors Holden Karau and Rachel Warren demonstrate performance optimizations to help your Spark queries run faster and handle larger data sizes, while using fewer resources. Ideal for software engineers, data engineers, developers, and system administrators working with large-scale data applications, this book describes techniques that can reduce data infrastructure costs and developer hours. Not only will you gain a more comprehensive understanding of Spark, you’ll also learn how to make it sing. With this book, you’ll explore: How Spark SQL’s new interfaces improve performance over SQL’s RDD data structure The choice between data joins in Core Spark and Spark SQL Techniques for getting the most out of standard RDD transformations How to work around performance issues in Spark’s key/value pair paradigm Writing high-performance Spark code without Scala or the JVM How to test for functionality and performance when applying suggested improvements Using Spark MLlib and Spark ML machine learning libraries Spark’s Streaming components and external community packagesO'Reillyoai:cds.cern.ch:22690842017
spellingShingle	Computing and Computers Karau, Holden Warren, Rachel High performance Spark: best practices for scaling and optimizing Apache Spark
title	High performance Spark: best practices for scaling and optimizing Apache Spark
title_full	High performance Spark: best practices for scaling and optimizing Apache Spark
title_fullStr	High performance Spark: best practices for scaling and optimizing Apache Spark
title_full_unstemmed	High performance Spark: best practices for scaling and optimizing Apache Spark
title_short	High performance Spark: best practices for scaling and optimizing Apache Spark
title_sort	high performance spark: best practices for scaling and optimizing apache spark
topic	Computing and Computers
url	http://cds.cern.ch/record/2269084
work_keys_str_mv	AT karauholden highperformancesparkbestpracticesforscalingandoptimizingapachespark AT warrenrachel highperformancesparkbestpracticesforscalingandoptimizingapachespark

High performance Spark: best practices for scaling and optimizing Apache Spark

Ejemplares similares