Delta Science – The art of designing new generation Data Lake

When we hear about Delta Lake, the first question that comes to our mind is

“What is Delta Lake and How it works ?”.

“Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark and big data workloads”

But the question is how it is possible to maintain transactions in the Big Data world ?. The answer is very simple. It is using Delta Format.

Delta Lake stores data in Delta Format. Delta format is a versioned parquet format along with a scalable metadata. It stores the data as parquet internally and it tracks the changes happening to the data in the metadata file. So the metadata will also grow along with the data.

Delta format solved several major challenges in the Big Data Lake world. Some of them are listed below

Transaction management
Versioning
Incremental Load
Indexing
UPSERT and DELETE operations
Schema Enforcement and Schema Evolution

I will elaborate this post by explaining each of the above features and explain more about the internals of Delta Lake.

All About Tech

Victory goes to the player who makes the next-to-last mistake

Delta Science – The art of designing new generation Data Lake

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply