Mastering Apache® Spark™ 2.0

Building Data Pipelines Effortlessly

Apache Spark™ 2.0 is a monumental shift in ease of use, higher performance, and smarter unification of APIs across Spark components. It establishes the foundation for a unified API interface for Structured Streaming, and also sets the course for how these unified APIs will be developed across Spark’s components in subsequent releases.
In this fourth eBook, we curate technical blogs and related assets. Whether you’re getting started with Spark or are an accomplished developer, it will arm you with the knowledge to employ all of Spark 2.0’s benefits, including:
  • Introduction to Apache Spark 2.0’s Unified APIs for Datasets, DataFrames and SparkSessions
  • Machine Learning MLlib’s DataFrame-based APIs
  • Spark SQL, Catalyst Optimizer and Tungsten’s Phase II performance enhancements
  • Continuous Applications: The evolution of Spark Streaming
  • How to employ Spark 2.0 Structured Streaming APIs
Download the eBook, Mastering Apache Spark 2.0, to learn more.

Get the eBook