Databricks on AWS Training Series

Join the webcasts on May 3, May 10 and May 17

Databricks provides a Unified Analytics Platform that accelerates innovation by bringing data and machine learning together. Founded by the original creators of Apache Spark™, Databricks unites Data Engineering and Data Science efforts to create automated data pipelines that enable thousands of organizations to succeed at multiple data science use cases. 

You should consider attending these trainings if you are a data engineer or data scientist interested in learning to use Apache Spark or Databricks

The three training sessions will cover:

  • Getting Started with Apache Spark (Friday, May 03, 2019, 10 AM PDT)
  • Data Engineering and Streaming Analytics (Friday, May 10, 2019, 10 AM PDT)
  • Machine Learning (Friday, May 17, 2019, 10 AM PDT)

These trainings can be viewed as a set or individually based on your needs and interest. Signing up will grant you access to all three. You can choose to attend any or all sessions, though we highly recommend attending all three to get the full landscape.

Databricks Training: Getting Started with Apache Spark™

Are the tools you’re using not running data and analytics workloads as quickly as you need? Are you looking to take advantage of the power of Apache Spark™ to support your full range of data engineering and data science projects, including data management and pipelines, streaming analytics, and machine learning?

In this free 2-hour online training, we’ll teach you how to get started with Apache Spark on Databricks:

  • Introduction to RDDs, DataFrames and Datasets for data transformation
  • Write your first Apache Spark job to load and work with data
  • Analyze your data and visualize your results in a Databricks Notebook
  • Intro Parquet and Delta Lakes on AWS S3 for data storage

Databricks Training: Data Engineering and Streaming Analytics

Are you looking for more efficiency with extract, transform and load of massive structured and unstructured data sets? Are the tools you’re using not running workloads as quickly as you need?

In this free two-hour training, we will show you how to leverage Databricks to build an ETL pipeline from raw ingest to the Data Warehouse and learn how to process streaming data and create visualizations based on continuously-updated aggregate results.

Learn how to:

  • Batch ingest using Databricks
  • Transform data using Spark SQL and DataFrames
  • Connect to Kinesis as streaming data sources
  • Use the DataFrame API to transform streaming data
  • Output the results to a variety of sinks
  • Create dynamic visualizations from real-time analytics on streaming data

Databricks Training: Machine Learning

Do you need to speed up building machine learning models and putting them into production? Databricks helps you develop, train, and tune accurate models faster. Get insights faster by collaborating via shared notebooks between multiple analysts and data scientists. Run cutting-edge machine learning on larger data sets, leveraging the increased speed and scale enabled by MLlib’s algorithms, which are optimized for parallelization.

In this free 2 hour online training, we’ll show you how you can use MLlib and MLflow with Databricks to train your own models, run reproducible experiments, and deploy into production with fewer failing jobs.

You should attend if you are a data engineer or data scientist excited about using Apache Spark™ to streamline your data prep, model training, hyperparameter tuning and deploying to production.