Performance Benchmarking Big Data Platforms in the Cloud

On-Demand Webinar

Performance is often a key factor in choosing big data platforms. Over the past few years, Apache Spark™ has seen rapid adoption by enterprises, making it the de facto data processing engine for its performance and ease of use.

Since starting the Spark project, our team at Databricks has been focusing on accelerating innovation by building the most performant and optimized Unified Analytics Platform for the cloud. Join Reynold Xin, Co-founder and Chief Architect of Databricks as he discusses the results of our benchmark (using TPC-DS industry standard requirements) comparing the Databricks Runtime (which includes Apache Spark and our DBIO accelerator module) with vanilla open source Spark in the cloud and how these performance gains can have a meaningful impact on your TCO for managing Spark.


This webinar covers:

  • Differences between open source Spark and Databricks Runtime.
  • Details on the benchmark including hardware configuration, dataset, etc.
  • Summary of the benchmark results which reveal performance gains by up to 5x over open source Spark and other big data engines.
  • A live demo comparing processing speeds of Databricks Runtime vs. open source Spark.
  • Special Announcement: We will also announce an experimental feature as part of the webinar that aims at drastically speeding up your workloads even more. Be the first to see this feature in action. Register today!


Presenters
  • Reynold Xin

    Co-founder and Chief Architect

    Reynold oversees Databricks' technical contributions to Apache Spark and Databricks Runtime, initiating efforts such as DataFrames, Project Tungsten, and Spark 2.0. To demonstrate Spark's scalability and performance, he led the efforts in the 2014 Daytona GraySort contest and set the 2014 world record, beating the previous record held by Hadoop with 30X higher per-node efficiency. He was also part of the team that set the 2016 CloudSort record for the most efficient and lowest cost software to sort 100TB of data in the cloud, beating the 2015 record by 3X.