Apache® Spark™ MLlib: From Quick Start to Scikit-Learn

On-Demand Webinar

Recorded on 02/24/2016 10:00am PT, 1:00pm ET, 6:00pm UTC


The slides and notebooks for this session are available as attachments within the webinar itself. Please start the webinar, hover over the webinar, click [Attachments], and you will be able to download all the materials.


In this webcast, Joseph Bradley from Databricks will be speaking about Apache Spark™’s distributed Machine Learning Library - MLlib.


We will start off with a quick primer on machine learning, Spark MLlib, and a quick overview of some Spark machine learning use cases. We will continue with multiple Spark MLlib quick start demos. Afterwards, the talk will transition toward the integration of common data science tools like Python pandas, scikit-learn, and R with MLlib

Presenters
  • Joseph Bradley

    Software Engineer - Databricks

    Joseph Bradley is a Software Engineer and Apache Spark Committer working on MLlib at Databricks. Previously, he was a postdoc at UC Berkeley after receiving his Ph.D. in Machine Learning from Carnegie Mellon U. in 2013. His research included probabilistic graphical models, parallel sparse regression, and aggregation mechanisms for peer grading in MOOCs.

  • Denny Lee

    Technology Evangelist - Databricks

    Denny Lee is a Technology Evangelist with Databricks; he is a hands-on data sciences engineer with more than 15 years of experience developing internet-scale infrastructure, data platforms, and distributed systems for both on-premises and cloud. Prior to joining Databricks, Denny worked as a Senior Director of Data Sciences Engineering at Concur and was part of the incubation team that built Hadoop on Windows and Azure (currently known as HDInsight).