How to delete a file or directory from Azure Data Lake Storage (ADLS) using spark ?

Azure Data Lake Storage is a scalable file system from Microsoft for storing large data. This is suitable for Enterprise Data Lakes. This file system is very popular now a days because of the huge Azure adoption happening across enterprises.

The ABFS connector and Hadoop Azure Data Lake connector modules provides support for integration with the Azure Data Lake Storages.

These connectors are already present in the hadoop distribution provided by Azure – HDInsights. So Azure HDInsights users does not have to make any changes in their system to interact with Azure Data Lake Store (ADLS Gen2).

For more details. Refer to the Apache Hadoop Website

A sample pyspark program that interacts with the Azure Data Lake Storage is given below. Here I am demonstrating delete and check operations.

	from pyspark.sql import SparkSession
	# Author: Amal G Jose
	# Reference: https://amalgjose.com

	# prepare spark session
	spark = SparkSession.builder.appName('filesystemoperations').getOrCreate()
	# spark context
	sc = spark.sparkContext

	# set ADLS file system URI
	sc._jsc.hadoopConfiguration().set('fs.defaultFS', 'abfs://CONTAINER@ACCOUNTNAME.dfs.core.windows.net/')

	# FileSystem manager
	fs = (sc._jvm.org
	.apache.hadoop
	.fs.FileSystem
	.get(sc._jsc.hadoopConfiguration())
	)
	# Enter the ADLS path
	path = "Your/adls/path"

	# Delete the file or directory in ADLS using the below command
	deletion_status = fs.delete(sc._jvm.org.apache.hadoop.fs.Path(path), True)
	print("Deletion status –>", deletion_status)

	# check whether the file or directory got deleted. This will return True if exists and False if does not
	status = fs.exists(sc._jvm.org.apache.hadoop.fs.Path(path))
	print("Status –>", status)

view raw spark_adls_filesystem_operations.py hosted with ❤ by GitHub

All About Tech

Victory goes to the player who makes the next-to-last mistake

How to delete a file or directory from Azure Data Lake Storage (ADLS) using spark ?

One thought on “How to delete a file or directory from Azure Data Lake Storage (ADLS) using spark ?”

Leave a comment Cancel reply

Share this:

Related

One thought on “How to delete a file or directory from Azure Data Lake Storage (ADLS) using spark ?”

Leave a comment Cancel reply