In the Apache Spark™ 2.x releases, Machine Learning (ML) is focusing on DataFrame-based APIs. This webinar is aimed at helping users take full advantage of the new APIs. Topics will include migrating workloads from RDDs to DataFrames, ML persistence for saving and loading models, and the roadmap ahead.
Migrating ML workloads to use Spark DataFrames and Datasets allows users to benefit from simpler APIs, plus speed and scalability improvements. As the DataFrame/Dataset API becomes the primary API for data in Spark, this migration will become increasingly important to MLlib users, especially for integrating ML with the rest of Spark data processing workloads. We will give a tutorial covering best practices and some of the immediate and future benefits to expect.
ML persistence is one of the biggest improvements in the DataFrame-based API. With Spark 2.0, almost all ML algorithms can be saved and loaded, even across languages. ML persistence dramatically simplifies collaborating across teams and moving ML models to production. We will demonstrate how to use persistence, and we will discuss a few existing issues and workarounds.
At the end of the webinar, we will discuss major roadmap items. These include API coverage, major speed and scalability improvements to certain algorithms, and integration with structured streaming.