State of the art Deep Learning on Apache Spark™
Big data and AI are joined at the hip: the best AI applications require massive amounts of constantly updated training data to build state-of-the-art models. More Spark users want to integrate Spark with distributed machine learning frameworks built for state-of-the-art training.
Here's the problem: big data frameworks like Spark and distributed deep learning frameworks don’t play well together due to the disparity between how big data jobs are executed and how deep learning jobs are executed.
During this webinar, we'll share how Project Hydrogen, a Spark Project Improvement Proposal led by Databricks, is positioned as a potential solution to this dilemma.
We will cover:
- Barrier execution mode for distributed DL training
- Fast data exchange between Spark and DL frameworks, and
- Accelerator-awareness scheduling
Join us for what will no doubt be an insightful session!
Presenters
Xiangrui Meng
Software Engineer, Databricks
Xiangrui Meng is an Apache Spark PMC member and a software engineer at Databricks. His main interests center around developing and implementing scalable algorithms for scientific applications. He has been actively involved in the development and maintenance of Spark MLlib since he joined Databricks. Before Databricks, he worked as an applied research engineer at LinkedIn, where he was the main developer of an offline machine learning framework in Hadoop MapReduce. His Ph.D. work at Stanford is on randomized algorithms for large-scale linear regression problems.