Delta Lake - Open Source Reliability for Data Lakes

Wednesday, 29th January 2020 - 11 am CET | 10 am GMT

Delta Lake is an open source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. Delta Lake runs on top of your existing data lake and is fully compatible with Apache Spark APIs.


Specifically, Delta Lake offers:

  • ACID transactions on Spark: Serializable isolation levels ensure that readers never see inconsistent data.
  • Scalable metadata handling: Leverages Spark's distributed processing power to handle all the metadata for petabyte-scale tables with billions of files at ease.
  • Streaming and batch unification: A table in Delta Lake is a batch table as well as a streaming source and sink. Streaming data ingest, batch historic backfill, interactive queries all just work out of the box.
  • Schema enforcement: Automatically handles schema variations to prevent insertion of bad records during ingestion.
  • Time travel: Data versioning enables rollbacks, full historical audit trails, and reproducible machine learning experiments.


In this webinar, you will have the opportunity to hear from Michael Armbrust, the lead engineer responsible for Delta Lake and ask questions to the regional technical team in EMEA at the end of the webinar.


Speaker:
Michael Armbrust
Principal Engineer, Databricks


Register Now