Databricks' Data Pipeline: Journey and Lessons Learned

On-Demand Webinar

In this webinar, we discuss the role and importance of ETL and what are the common features of an ETL pipeline. We will then show how the same ETL fundamentals are applied and (more importantly) simplified within Databricks’ Data pipelines. By utilizing Apache Spark™ as its foundation, we can simplify our ETL processes using one framework. With Databricks and AWS Kinesis, you can develop your pipeline code in notebooks, create Jobs to productionize your notebooks, and utilize REST APIs to turn all of this into a continuous integration workflow. We will provide tips and tricks of doing ETL with Spark and lessons learned from our pipeline.

Presenters
  • Burak Yavuz

    Software Engineer - Databricks

    Burak Yavuz is a Software Engineer at Databricks. He has been contributing to Spark since Spark 1.1, and is the maintainer of Spark Packages. Burak received his BS in Mechanical Engineering at Bogazici University, Istanbul, and his MS in Management Science & Engineering at Stanford.