How to set Kafka Heap Size?

Setting Kafka Heap size is simple, by default Kafka runs with 512MB as the heap size. For increasing the heap size, set the following environment variable and restart Kafka.

export KAFKA_HEAP_OPTS="-Xmx2G -Xms2G"

Kafka will check for KAFKA_HEAP_OPTS before it starts and if there is no value set for this variable, it assigns 512MB as the value, else it will pick up the configured value.

Delta Science – The art of designing new generation Data Lake

When we hear about Delta Lake, the first question that comes to our mind is

“What is Delta Lake and How it works ?”. 

“Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark and big data workloads”

But the question is how it is possible to maintain transactions in the Big Data world ?. The answer is very simple. It is using Delta Format.

Delta Lake stores data in Delta Format. Delta format is a versioned parquet format along with a scalable metadata. It stores the data as parquet internally and it tracks the changes happening to the data in the metadata file. So the metadata will also grow along with the data.

Delta format solved several major challenges in the Big Data Lake world.  Some of them are listed below

  1. Transaction management
  2. Versioning
  3. Incremental Load
  4. Indexing
  5. UPSERT and DELETE operations
  6. Schema Enforcement and Schema Evolution

I will elaborate this post by explaining each of the above features and explain more about the internals of Delta Lake.