Programmatic Data Upload to Amazon S3

Date: April 23, 2015Author: Amal G Jose 2 Comments

S3 is a service provided by Amazon for storing data. The full form is Simple Storage Service. S3 is a very useful service for less price. Data can be uploaded to and downloaded from S3 very easily using some tools as well as program. Here I am explaining a sample program for uploading file to S3 using a python program.

Files can be uploaded to S3 in two approaches. One is the normal upload and another is the multipart upload. Normal upload sends the file serially and is not suitable for large files. It will take more time. For large files, multipart upload is the best option. It will upload the file by dividing it into chunks and sends it in parallel and collects it in S3.

This program is using the normal approach for sending the files to S3. Here I used the boto library for uploading the files.

	__author__ = 'Amal G Jose'

	import boto
	import ntpath
	from boto.s3.connection import S3Connection

	class SimpleDataUploader(object):

	##Initializer
	def __init__(self):
	try:
	self.s3_bucket = "mybucket"
	self.s3_path = "data"
	self.aws_access_key = "XXXXXXXXXXXX"
	self.aws_secret_key = "XXXXXXXXXXXX"
	self.s3_conn = boto.connect_s3(aws_access_key_id=self.aws_access_key,
	aws_secret_access_key=self.aws_secret_key,
	calling_format=boto.s3.connection.OrdinaryCallingFormat())
	except Exception, e:
	print "Exception occurred in the initializer : " + str(e)

	##This method will upload files to S3
	def upload_to_s3(self, local_file_name):
	try:
	path, file_name = ntpath.split(local_file_name)
	bucket = self.s3_conn.get_bucket(self.s3_bucket)
	key = bucket.new_key(file_name)
	key.set_contents_from_filename(local_file_name)
	print "File uploaded successfully"
	except Exception, e:
	print "Exception while uploading file to S3 : " + str(e)

	if __name__ == '__main__':
	uploader = SimpleDataUploader()
	uploader.upload_to_s3("data.txt")

view raw

SimpleDataUploader.py

hosted with ❤ by GitHub

2 thoughts on “Programmatic Data Upload to Amazon S3”

naveen says:

April 25, 2015 at 5:16 am

hi
i am looking oozie installation in emr cluster (oozie 4.0.0 or 4.0.1 with apache hadoop 2.4) ,can u pls help me in this(i found two scripts in github but not solving )…

Reply
1. amalgjose says:
  
  April 25, 2015 at 7:00 am
  
  Hi,
  oozie installation is pretty simple. You have to install oozie on the master node. For that you can write a custom bootstrap script or you can do a manual installation. A sample custom script is present in this url
  https://github.com/lila/emr-oozie-sample/blob/master/config/config-oozie.sh
  
  This will not work directly, because all the urls mentioned in this script is not available now. So you can take this and make the changes as per the new oozie present in apache repository. Change the folder names accordingly.
  
  Reply

Leave a comment Cancel reply

Advertisements

Advertisements

Advertisements

Advertisements

Advertisements