Python code to list all the running EC2 instances across all regions in an AWS account

This code snippet will help you to get the list of all running EC2 instances across all regions in an AWS account. I have used python boto3 package for developing the code. This code will dynamically pick up all the aws ec2 regions. So the code will work perfectly without any modification even if a new region gets added to the AWS.

Note: Only the basic api calls just to list the instance details are mentioned in this program . Proper coding convention is not followed . 🙂

Programmatic way to reboot EC2 instances

Sometimes we might have to reboot EC2 instances. If the requirement is to restart EC2 instances regularly, we can achieve it by writing a small piece of code. I also came across a similar requirement and a portion of the code I used is given below.

 

Programmatic Data Upload to Amazon S3

S3 is a service provided by Amazon for storing data. The full form is Simple Storage Service. S3 is a very useful service for less price. Data can be uploaded to and downloaded from S3 very easily using some tools as well as program. Here I am explaining  a sample program for uploading file to S3 using a python program.

Files can be uploaded to S3 in two approaches. One is the normal upload and another is the multipart upload. Normal upload sends the file serially and is not suitable for large files. It will take more time. For large files, multipart upload is the best option. It will upload the file by dividing it into chunks and sends it in parallel and collects it in S3.

This program is using the normal approach for sending the files to S3. Here I used the boto library for uploading the files.

Launching an EMR cluster using Python program

Amazon’s EMR is a very easiest way to launch hadoop cluster. Amazon is providing a console as well as api interface for launching clusters. Boto is a python library for dealing with amazon web services. Boto is not only for EMR, it is for most of the amazon web services. Here I am sharing with you a small program for launching an EMR cluster using python boto. This program helps us in situations where automation is required.
In this program, the hadoop cluster will be launched with services such as Pig, Hive, Impala, Ganglia and with some user defined installation.
Bootstrapping is a process in which we can add our own custom installations while launching the cluster.
Suppose if we want to install our own custom installations in the emr cluster in all the nodes, doing the same process manually will be difficult. The bootstrap option helps us to solve this problem in a very simple way. The only thing we need to do is to write a shell script containing the custom steps and put it in an S3 bucket and specify that bucket while installation.

For writing all the amazon related programs, you can check the python boto api. All the methods and coding conventions are taken from the python boto. The best way to learn python coding conventions is by following the conventions used in the boto source code. Boto is an open source python library. I wrote this code by referring the python boto documentation. The coding standards used in this code is also similar to that in the boto.

In this program, if you don’t have a bootstrap step, you can keep it as None.

The code is given below. You can get the code from github also.

What is EMR ?

What is  EMR.?

EMR is a cloud service provided by amazon. Its full form is Elastic Mapreduce.

We can launch hadoop clusters of our desired size in few minutes using this service.

We can simply increase or decrease the number of nodes in the cluster while running without any disturbance. That is why it is called as Elastic. It is very simple to operate and doesn’t require much administration skills. Pay for whatever we use, no need of server room , cooling mechanism, power backup etc. We will get everything very fast for affordable amount. We can configure hadoop , hadoop ecosystem components such as hive, pig, impala etc in an emr cluster.

Now shark and spark are also available with EMR. If we need any additional services to be iinstalled in our cluster, we can create our own custom bootstrap script for installing those services in the cluster and add the script while launching the cluster.

There are three types of nodes in an EMR cluster. Master, Core and Task.

Master node contains the master daemons in hadoop cluster such as Namenode and Jobtracker for MRv1 and Namenode and Resource Manager in case of YARN. Core node contains Datanode and Tasktracker for MRv1 and Datanode and Node manager for YARN. Task nodes contains the processing daemons only,ie tasktracker or nodemanager. After launching a cluster we can increase the number of Core nodes and Task nodes, but we can decrease only the number of task nodes. We can’t reduce the number of core nodes, because core nodes contains datanodes which will store, decreasing the number of datanodes may result in data loss.

A super cool library called Boto is available in python for dealing with EMR.

Why EMR cannot be launched in all type of VPCs.?

For launching an EMR, the VPC should have an internet gateway and a subnet. So if internet is restricted in the VPC, EMR cannot be launched. The reason for this is, while launching an EMR, it contacts with some remote locations for downloading the required softwares and installation scripts. So if internet is not available, that connection will be blocked which results in installation failure.

Python program to list all redshift clusters across all regions

Python program to list the details of all the redshift clusters across all regions.