S3 is a service provided by Amazon for storing data. The full form is Simple Storage Service. S3 is a very useful service for less price. Data can be uploaded to and downloaded from S3 very easily using some tools as well as program. Here I am explaining a sample program for uploading file to S3 using a python program.
Files can be uploaded to S3 in two approaches. One is the normal upload and another is the multipart upload. Normal upload sends the file serially and is not suitable for large files. It will take more time. For large files, multipart upload is the best option. It will upload the file by dividing it into chunks and sends it in parallel and collects it in S3.
This program is using the normal approach for sending the files to S3. Here I used the boto library for uploading the files.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
__author__ = 'Amal G Jose' | |
import boto | |
import ntpath | |
from boto.s3.connection import S3Connection | |
class SimpleDataUploader(object): | |
##Initializer | |
def __init__(self): | |
try: | |
self.s3_bucket = "mybucket" | |
self.s3_path = "data" | |
self.aws_access_key = "XXXXXXXXXXXX" | |
self.aws_secret_key = "XXXXXXXXXXXX" | |
self.s3_conn = boto.connect_s3(aws_access_key_id=self.aws_access_key, | |
aws_secret_access_key=self.aws_secret_key, | |
calling_format=boto.s3.connection.OrdinaryCallingFormat()) | |
except Exception, e: | |
print "Exception occurred in the initializer : " + str(e) | |
##This method will upload files to S3 | |
def upload_to_s3(self, local_file_name): | |
try: | |
path, file_name = ntpath.split(local_file_name) | |
bucket = self.s3_conn.get_bucket(self.s3_bucket) | |
key = bucket.new_key(file_name) | |
key.set_contents_from_filename(local_file_name) | |
print "File uploaded successfully" | |
except Exception, e: | |
print "Exception while uploading file to S3 : " + str(e) | |
if __name__ == '__main__': | |
uploader = SimpleDataUploader() | |
uploader.upload_to_s3("data.txt") |
hi
i am looking oozie installation in emr cluster (oozie 4.0.0 or 4.0.1 with apache hadoop 2.4) ,can u pls help me in this(i found two scripts in github but not solving )…
Hi,
oozie installation is pretty simple. You have to install oozie on the master node. For that you can write a custom bootstrap script or you can do a manual installation. A sample custom script is present in this url
https://github.com/lila/emr-oozie-sample/blob/master/config/config-oozie.sh
This will not work directly, because all the urls mentioned in this script is not available now. So you can take this and make the changes as per the new oozie present in apache repository. Change the folder names accordingly.