Python code to list all the running EC2 instances across all regions in an AWS account

This code snippet will help you to get the list of all running EC2 instances across all regions in an AWS account. I have used python boto3 package for developing the code. This code will dynamically pick up all the aws ec2 regions. So the code will work perfectly without any modification even if a new region gets added to the AWS.

Note: Only the basic api calls just to list the instance details are mentioned in this program . Proper coding convention is not followed . 🙂

How to attach a new EBS to an EC2 instance

Nowadays majority of us are using some cloud services. Amazon Web Services is one of the popular provider among all the other cloud service providers. Just like we upgrade our harddisk or mounting new drives to physical machines, we can attach new block storages to Amazon EC2 also. Amazon provides a service called EBS (Elastic Block Storage). There are various types of EBS with various speed and cost. Example are magnetic, SSD etc.

Attaching a new EBS to a running EC2 instance is very simple. We can do this programatically as well as using the console. Here I am explaining the basic steps to perform this operation using the console.

  1. Launch an EBS in the same region and same availability zone as that of the EC2 instance
  2. Note down the instance id of the EC2 instance
  3. Attach the EBS to the EC2. This can be done by using the attach option available in the EBS. The EBS will be listed under the Volumes section in EC2 service page of AWS console.
  4. Login to the EC2 instance and switch to the root user
  5. Type lsblk to list all the block devices
  6. Identify the new block device.
  7. Create a new directory to mount the EBS.
  8. Format the newly mounted storage. The command is mkfs -t ext4 /dev/<device-name>
  9. Mount the EBS on the directory. The command is mount /dev/<device-name>  <mount-dir>
  10. Check for the new storage. The command is df -h



Recovering a corrupted EC2 instance

Amazon Web Services is one of the most popular cloud service providers. I am a customer of Amazon. I like the services provided by Amazon very much. Compared to other cloud service providers AWS is simple, secure and advanced. I use EC2 machines for my project related activities as well as my personal experiments. Since I mostly work on open source software, 99.99 % of my EC2 instances are Linux instances. The only way to access these instances is through ssh. I use putty as the ssh client. If something happens to the ssh server, we will not be able to access the server. Sometimes the ssh server crashes due to overload. This can be resolved by rebooting the instance.

Sometimes because of wrong configs in the sshd config file, the ssh server may stop. The ssh server will not start until we make that file proper and restart the service. But for making these changes we have to access the machine.

By default we don’t have direct root login into the machine. We usually login to one user which is a sudo user and using sudo privileges, we access the root. If something happens to the sudoers file or if some wrong entry made in sudoers file, the root access will be revoked.

These are some of the commonly occurred situations where users loose access or super user privilege in the ec2 machine. Most of the users terminate and leave the instance in this situation.

If the instance is an EBS backed instance, we don’t have to terminate and leave the machines in this kind of situations. We can recover these instances. It is simple and can be done in few steps. If the instance is with ephemeral storage, we cannot do anything, because shutting down will clear all the data in the instance.

  1. Start a new instance in the same availability zone as that of the EBS of the broken machine. Micro or nano instance type is fine. If you already have an instance, no need of this instance.
  2. Stop the broken machine. Note down the mount locations
  3. Detach the EBS from the instance.
  4. Attach the EBS to the second EC2 instance (The newly launched one)
  5. Mount the EBS to some directory in the second EC2 instance.
  6. Navigate through the files and directories and make the required changes.
  7. Unmount the EBS
  8. Detach the EBS from the second instance
  9. Attach the EBS to the first instance
  10. Use the same mount location as that of the orginal
  11. Start the instance.

This should fix the problem.

Programmatic way to reboot EC2 instances

Sometimes we might have to reboot EC2 instances. If the requirement is to restart EC2 instances regularly, we can achieve it by writing a small piece of code. I also came across a similar requirement and a portion of the code I used is given below.


Programmatic Data Upload to Amazon S3

S3 is a service provided by Amazon for storing data. The full form is Simple Storage Service. S3 is a very useful service for less price. Data can be uploaded to and downloaded from S3 very easily using some tools as well as program. Here I am explaining  a sample program for uploading file to S3 using a python program.

Files can be uploaded to S3 in two approaches. One is the normal upload and another is the multipart upload. Normal upload sends the file serially and is not suitable for large files. It will take more time. For large files, multipart upload is the best option. It will upload the file by dividing it into chunks and sends it in parallel and collects it in S3.

This program is using the normal approach for sending the files to S3. Here I used the boto library for uploading the files.

Program to change the permission of files present in an Amazon S3 bucket

The program below gives read permission recursively to all the files present in an S3 bucket.

Service Nanny in AWS EMR

Service nanny is a service that runs in all the nodes of AWS EMR that controls the operation of daemons in each node.If a process gets killed because of OOM killer or overload etc, it restarts immediately and ensures that the service is alive. This service ensures that the cluster services are always alive without the problems created by unexpected exists in the services. So even if you kill a process or stop a process, it will get automatically restarted.

Recently I faced an issue with impala in AWS EMR. I was getting an error as described in this post. I was using a small  3 node EMR cluster. Instead of creating a new cluster I thought of restarting the impala daemon by specifying the additional arguments. But I was not able to perform this because the service nanny was performing the daemon start before I performing the start. So I stopped the service nanny in all the nodes and restarted impala with extra arguments and then restarted the service nanny.

We can modify service nanny control behavior by editing the config files present in /etc/service-nanny/ directory. You can see config files for each service controlled by service nanny. You can add/remove/modify the control actions  by adding/removing/modifying the config files.

How to validate a file in S3

S3 is a storage service provided by Amazon. We can use this as a place to store, backup or archive our data. S3 is a storage which is accessible from the public network. So the data reaches S3 through internet. So while doing the data transmission to S3, one important thing that we have to ensure is the correctness of the data. Because if the data gets

corrupted while transferring, it will be a big problem. So we have to ensure the correctness of the data. This is possible only by comparing the S3 copy with the master copy. But how to achieve this ???

In local file system we can do the file comparison by calculating the checksum. But in S3 how we will perform this ?.
Calculating checksum involves reading the complete file. But do we have a provision to calculate the checksum in S3.?

Yes we have. We don’t have to calculate again, but use one of the properties of an S3 file to compare it with the source file. Every S3 file has a property called ETag. This etag is a checksum that is calculated while the file is transferred to S3. The tricky part is the way in which Etag is calculated. Etag can be calculated in different ways. So the Etag of a file may be different depending upon the way we transfer the file.

The funda is simple. The Etag of a file depends on the chunk size in which the file gets transferred to S3. So for validating a file, we have to find the etag of the S3 file and calculate a checksum of the local file using the same logic that is used to calculate the Etag of that file in S3. The etag calculation of files uploaded to S3 in normal way is simple and it will be equal to normal md5 checksum. But if we use multipart upload, then the Etag differs. Now the question arises, what is multipart upload ??

Inorder to transfer large files to S3, it is divide it into small parts and upload the parts in parallel and assemble at the S3 side. If we transmit a single large file directly, if some failure happens, the entire file transfer fails and restartability will be also difficult. But if we divide the large file into smaller chunks and transfer it in parallel, the transmission speed increases, the reliability also increases. If the transfer of a chunk fails, we can retry that chunk alone and hence improves the restartability.

Here I am giving an example of checking the Etag of a file and comparing it with the normal md5 checksum of the file.

Suppose I have an S3 bucket with the name checksum-testand I have a file with with the name sample.txt which is of 100 MB inside the checksum-test bucket in a location file/sample.txt

Then the bucket name is checksum-test
full key name will be file/sample.txt

Python program to list all redshift clusters across all regions

Python program to list the details of all the redshift clusters across all regions.