Deep Dive: Apache Spark™ Memory Management

On-Demand Webinar

Memory management is at the heart of any data-intensive system. Apache Spark™, in particular, must arbitrate memory allocation between two main use cases: buffering intermediate data for processing (execution) and caching user data (storage). This talk will take a deep dive through the memory management designs adopted in Spark since its inception and discuss their performance and usability implications for the end user.

  • Andrew Or

    Software Engineer - Databricks

    Andrew is an Apache Spark PMC member. In the past, he has contributed several large features to the project, including event logging, external spilling, history server, dynamic allocation, and DAG visualization on the SparkUI. He is an active maintainer of the Spark on YARN integration component.