How to enable access to a S3 object in a secure way ?

By default all the objects in an AWS bucket are private. Sometimes we may come across the requirement to share an object (file) present in s3 to others and don’t want to share aws access credentials.

In these scenarios, AWS provides a simple way to provide access by generating a timebound signed URLs using the access credentials. These URLs will be active only for the specified time period and the access will be revoked after that time period. I came across a similar requirement and the solution is shared below.

Python code to list all the running EC2 instances across all regions in an AWS account

This code snippet will help you to get the list of all running EC2 instances across all regions in an AWS account. I have used python boto3 package for developing the code. This code will dynamically pick up all the aws ec2 regions. So the code will work perfectly without any modification even if a new region gets added to the AWS.

Note: Only the basic api calls just to list the instance details are mentioned in this program . Proper coding convention is not followed . 🙂

How to attach a new EBS to an EC2 instance

Nowadays majority of us are using some cloud services. Amazon Web Services is one of the popular provider among all the other cloud service providers. Just like we upgrade our harddisk or mounting new drives to physical machines, we can attach new block storages to Amazon EC2 also. Amazon provides a service called EBS (Elastic Block Storage). There are various types of EBS with various speed and cost. Example are magnetic, SSD etc.

Attaching a new EBS to a running EC2 instance is very simple. We can do this programatically as well as using the console. Here I am explaining the basic steps to perform this operation using the console.

  1. Launch an EBS in the same region and same availability zone as that of the EC2 instance
  2. Note down the instance id of the EC2 instance
  3. Attach the EBS to the EC2. This can be done by using the attach option available in the EBS. The EBS will be listed under the Volumes section in EC2 service page of AWS console.
  4. Login to the EC2 instance and switch to the root user
  5. Type lsblk to list all the block devices
  6. Identify the new block device.
  7. Create a new directory to mount the EBS.
  8. Format the newly mounted storage. The command is mkfs -t ext4 /dev/<device-name>
  9. Mount the EBS on the directory. The command is mount /dev/<device-name>  <mount-dir>
  10. Check for the new storage. The command is df -h

 

 

Recovering a corrupted EC2 instance

Amazon Web Services is one of the most popular cloud service providers. I am a customer of Amazon. I like the services provided by Amazon very much. Compared to other cloud service providers AWS is simple, secure and advanced. I use EC2 machines for my project related activities as well as my personal experiments. Since I mostly work on open source software, 99.99 % of my EC2 instances are Linux instances. The only way to access these instances is through ssh. I use putty as the ssh client. If something happens to the ssh server, we will not be able to access the server. Sometimes the ssh server crashes due to overload. This can be resolved by rebooting the instance.

Sometimes because of wrong configs in the sshd config file, the ssh server may stop. The ssh server will not start until we make that file proper and restart the service. But for making these changes we have to access the machine.

By default we don’t have direct root login into the machine. We usually login to one user which is a sudo user and using sudo privileges, we access the root. If something happens to the sudoers file or if some wrong entry made in sudoers file, the root access will be revoked.

These are some of the commonly occurred situations where users loose access or super user privilege in the ec2 machine. Most of the users terminate and leave the instance in this situation.

If the instance is an EBS backed instance, we don’t have to terminate and leave the machines in this kind of situations. We can recover these instances. It is simple and can be done in few steps. If the instance is with ephemeral storage, we cannot do anything, because shutting down will clear all the data in the instance.

  1. Start a new instance in the same availability zone as that of the EBS of the broken machine. Micro or nano instance type is fine. If you already have an instance, no need of this instance.
  2. Stop the broken machine. Note down the mount locations
  3. Detach the EBS from the instance.
  4. Attach the EBS to the second EC2 instance (The newly launched one)
  5. Mount the EBS to some directory in the second EC2 instance.
  6. Navigate through the files and directories and make the required changes.
  7. Unmount the EBS
  8. Detach the EBS from the second instance
  9. Attach the EBS to the first instance
  10. Use the same mount location as that of the orginal
  11. Start the instance.

This should fix the problem.

Programmatic way to reboot EC2 instances

Sometimes we might have to reboot EC2 instances. If the requirement is to restart EC2 instances regularly, we can achieve it by writing a small piece of code. I also came across a similar requirement and a portion of the code I used is given below.

 

Load Balancers – HA Proxy and ELB

ELB

HAProxy

Earlier I wondered how the sites like google handles the large number of requests reaching there. Later I came to know that there is a concept of load balancing. Using load balancing we can keep multiple servers in the back end and route the incoming requests to the back end servers. This will ensure faster response as well as high availability. This Load balancers play a very important role. There are a lot of opensource load balancers as well as paid services. HAProxy is one of the opensource load balancer. Amazon is providing a Load Balancer as a service known as Elastic Load Balancer (ELB).Using the load balancer, we can handle very large number of requests in a very reliable and optimal way. We can use this load balancer in Impala for load balancing the requests hitting the impala server. For on-premise environments, we can configure HAProxy and for cloud environments, we can use ELB.The ELB is a ready to use service, we just have to add the details of ports to be forwarded and the listener machines. HA Proxy is a very simple application that is available in the linux repositories. It is very easy to configure also.

Impala Scratch directory issue

In impala while running queries over large data, sometimes we may get an error like this.

WARNINGS: Create file /tmp/impala-scratch/94869b99d0d6457:765d2bc009a914ad_94869b99d0d6457:765d2bc009a914af_516bff1b-7342-434e-8c95-c777bb7f237e failed with errno=2 description=Error(2): No such file or directory

Backend 1:Create file /tmp/impala-scratch/94869b99d0d6457:765d2bc009a914ad_94869b99d0d6457:765d2bc009a914af_516bff1b-7342-434e-8c95-c777bb7f237e failed with errno=2 description=Error(2): No such file or directory

One of my friends faced this issue and on investigation I found that this issue is because of the unexpected deletion of files inside the impala scratch directory. The intermediate files used during large sort, join, aggregation, or analytic function operations are stored in this scratch directory. By default the impala scratch directory is /tmp/impala-scratch. These directoroes will be deleted after the query execution. The best solution for this problem is to change the scratch directory to some other directory. This can be done by starting the impala daemon with the option –scratch_dirs=”path_to_directory”. This directory is in the local linux file system. Not in the hdfs. Impala will not start if it is not having proper read/write access to the files in the “scratch” directory.

If you are using impala in EMR cluster, to modify the start up options, make the changes in the bootstrap action. If you want to modify this conf in an existing EMR cluster, stop the service nanny in all the nodes and restart the impala with this scratch directory property. If service nanny is running, you will not be able to restart the impala with this new argument because the service nanny will perform the service restart before your restart .. 🙂

Service Nanny in AWS EMR

Service nanny is a service that runs in all the nodes of AWS EMR that controls the operation of daemons in each node.If a process gets killed because of OOM killer or overload etc, it restarts immediately and ensures that the service is alive. This service ensures that the cluster services are always alive without the problems created by unexpected exists in the services. So even if you kill a process or stop a process, it will get automatically restarted.

Recently I faced an issue with impala in AWS EMR. I was getting an error as described in this post. I was using a small  3 node EMR cluster. Instead of creating a new cluster I thought of restarting the impala daemon by specifying the additional arguments. But I was not able to perform this because the service nanny was performing the daemon start before I performing the start. So I stopped the service nanny in all the nodes and restarted impala with extra arguments and then restarted the service nanny.

We can modify service nanny control behavior by editing the config files present in /etc/service-nanny/ directory. You can see config files for each service controlled by service nanny. You can add/remove/modify the control actions  by adding/removing/modifying the config files.