Azure Data Lake storage is a very popular data storage platform from Microsoft. I was searching for an option to upload a complete directory including all the subfolders and files from the local to ADLS. I searched a lot in internet, but I could not find a ready made solution. The only program that I found was the program to upload a single file to ADLS.

ADLS Gen2 support multiprotocol access. For more details about multiprotocol access, please refer to the Azure documentation. So I have used Blob API to perform the file upload.

Since my requirement was to upload a directory, I thought of writing a utility myself so that I can reuse it every time when I need it. I quickly developed a simple utility. This program recursively uploads all the files to ADLS as blobs. The only drawback with this utility is that it will not create any empty directory in the ADLS. All the directories with files will be recreated but empty directories will not be created. I will handle this issue later by tweaking this program further. This is the first version of my program.

I have used the azure-storage-blob python package in my python program. The command to install the package is given below.

pip install azure-storage-blob

The complete python program to upload directory or folder to Azure Data Lake Storage Gen2 ( ADLS Gen 2 ) is given below. This uses Azure Blob storage API to upload the files. I will be enhancing this utility further to perform high speed parallel upload to ADLS in the next version.

from azure.storage.blob import BlobServiceClient
# Install the following package before running this program
# pip install azure-storage-blob
def upload_data_to_adls():
"""
Function to upload local directory to ADLS
:return:
"""
# Azure Storage connection string
connect_str = ""
# Name of the Azure container
container_name = ""
# The path to be removed from the local directory path while uploading it to ADLS
path_to_remove = ""
# The local directory to upload to ADLS
local_path = ""
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
# The below code block will iteratively traverse through the files and directories under the given folder and uploads to ADLS.
for r, d, f in os.walk(local_path):
if f:
for file in f:
file_path_on_azure = os.path.join(r, file).replace(path_to_remove, "")
file_path_on_local = os.path.join(r, file)
blob_client = blob_service_client.get_blob_client(container=container_name, blob=file_path_on_azure)
with open(file_path_on_local, "rb") as data:
blob_client.upload_blob(data)
print("uploading file —->", file_path_on_local)
if __name__ == '__main__':
# invoking the upload_data_to_adls() function.
upload_data_to_adls()
upload a directory or folder to ADLS Gen 2 | Azure Blog Storage

I hope this program helps someone 🙂 . Thank you for reading my article. Please like this article and subscribe my blog if you like to get notifications of the future articles.