Azure Data Lake storage is a very popular data storage platform from Microsoft. I was searching for an option to upload a complete directory including all the subfolders and files from the local to ADLS. I searched a lot in internet, but I could not find a ready made solution. The only program that I found was the program to upload a single file to ADLS.
ADLS Gen2 support multiprotocol access. For more details about multiprotocol access, please refer to the Azure documentation. So I have used Blob API to perform the file upload.
Since my requirement was to upload a directory, I thought of writing a utility myself so that I can reuse it every time when I need it. I quickly developed a simple utility. This program recursively uploads all the files to ADLS as blobs. The only drawback with this utility is that it will not create any empty directory in the ADLS. All the directories with files will be recreated but empty directories will not be created. I will handle this issue later by tweaking this program further. This is the first version of my program.
I have used the azure-storage-blob python package in my python program. The command to install the package is given below.
pip install azure-storage-blob
The complete python program to upload directory or folder to Azure Data Lake Storage Gen2 ( ADLS Gen 2 ) is given below. This uses Azure Blob storage API to upload the files. I will be enhancing this utility further to perform high speed parallel upload to ADLS in the next version.
I hope this program helps someone 🙂 . Thank you for reading my article. Please like this article and subscribe my blog if you like to get notifications of the future articles.