Productionizing Apache Spark™ MLlib Models for Real-time Prediction Serving

On-Demand Webinar
Data science and machine learning tools traditionally focus on training models. When companies begin to employ machine learning in actual production workflows, they encounter new sources of friction such as sharing models across teams, deploying identical models on different systems, and maintaining featurization logic. In this webinar, we discuss how Databricks provides a smooth path for productionizing Apache Spark™ MLlib models and featurization pipelines.

Databricks Model Scoring provides a simple API for exporting MLlib models and pipelines. These exported models can be deployed in many production settings, including:

  • External real-time low-latency prediction serving systems, without Spark dependencies,
  • Apache Spark Structured Streaming jobs, and
  • Apache Spark batch jobs.

In this webinar, we overview our solution’s functionality, describe its architecture, and demonstrate how to use it to deploy MLlib models to production.
 
Presenters


sue-hong.jpg
Sue Ann Hong
Software Enginner

Sue Ann Hong is a software engineer in the Machine Learning team at Databricks where she contributes to MLlib and Deep Learning Pipelines Library. She got her Ph.D. at CMU studying machine learning and distributed optimization and worked as a software engineer at Facebook in Ads and Commerce.


joseph-bradley.jpg
Joseph Bradley
Software Engineer

Joseph Bradley is a Software Engineer and Spark Committer working on MLlib at Databricks. Previously, he was a postdoc at UC Berkeley after receiving his Ph.D. in Machine Learning from Carnegie Mellon U. in 2013. His research included probabilistic graphical models, parallel sparse regression, and aggregation mechanisms for peer grading in MOOCs.


Sign up today