Delta lake is supported in the latest version of Apache Spark. Delta Lake is open sourced with Apache 2.0 license. So it is free to use. Delta Lake is supported in Apache Spark versions above 2.4.2. It is very easy to set up and it does not require any admin skills to configure. Delta Lake is available by default in Databricks. We don’t have to do any installation or configuration to use this Delta Lake in Databricks.

For trying out the basic example, launch pyspark or spark-shell by adding the delta package. No need of any additional installation. Just use the following command

For pyspark

pyspark --packages io.delta:delta-core_2.11:0.4.0
For spark-shell
bin/spark-shell --packages io.delta:delta-core_2.11:0.4.0

The above command/s will add delta package to the context and delta lake will be enabled. You can try out the following basic example in the pyspark shell.


%python
# Create a temparory dataset
data = spark.range(0, 50)
data.write.format("delta").save("/tmp/myfirst-delta-table")
# Read the data
df = spark.read.format("delta").load("/tmp/myfirst-delta-table")
df.show()
# Updating the dataset
data = spark.range(51, 100)
data.write.format("delta").mode("overwrite").save("/tmp/myfirst-delta-table")
# Read the data
df = spark.read.format("delta").load("/tmp/myfirst-delta-table")
df.show()
# Read the older version of data
df = spark.read.format("delta").option("versionAsOf", 0).load("/tmp/myfirst-delta-table")
df.show()

Advertisement