Delta lake is supported in the latest version of Apache Spark. Delta Lake is open sourced with Apache 2.0 license. So it is free to use. Delta Lake is supported in Apache Spark versions above 2.4.2. It is very easy to set up and it does not require any admin skills to configure. Delta Lake is available by default in Databricks. We don’t have to do any installation or configuration to use this Delta Lake in Databricks.
For trying out the basic example, launch pyspark or spark-shell by adding the delta package. No need of any additional installation. Just use the following command
For pyspark
pyspark --packages io.delta:delta-core_2.11:0.4.0
bin/spark-shell --packages io.delta:delta-core_2.11:0.4.0
The above command/s will add delta package to the context and delta lake will be enabled. You can try out the following basic example in the pyspark shell.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
%python | |
# Create a temparory dataset | |
data = spark.range(0, 50) | |
data.write.format("delta").save("/tmp/myfirst-delta-table") | |
# Read the data | |
df = spark.read.format("delta").load("/tmp/myfirst-delta-table") | |
df.show() | |
# Updating the dataset | |
data = spark.range(51, 100) | |
data.write.format("delta").mode("overwrite").save("/tmp/myfirst-delta-table") | |
# Read the data | |
df = spark.read.format("delta").load("/tmp/myfirst-delta-table") | |
df.show() | |
# Read the older version of data | |
df = spark.read.format("delta").option("versionAsOf", 0).load("/tmp/myfirst-delta-table") | |
df.show() |