How to attach a new EBS to an EC2 instance

Nowadays majority of us are using some cloud services. Amazon Web Services is one of the popular provider among all the other cloud service providers. Just like we upgrade our harddisk or mounting new drives to physical machines, we can attach new block storages to Amazon EC2 also. Amazon provides a service called EBS (Elastic Block Storage). There are various types of EBS with various speed and cost. Example are magnetic, SSD etc.

Attaching a new EBS to a running EC2 instance is very simple. We can do this programatically as well as using the console. Here I am explaining the basic steps to perform this operation using the console.

  1. Launch an EBS in the same region and same availability zone as that of the EC2 instance
  2. Note down the instance id of the EC2 instance
  3. Attach the EBS to the EC2. This can be done by using the attach option available in the EBS. The EBS will be listed under the Volumes section in EC2 service page of AWS console.
  4. Login to the EC2 instance and switch to the root user
  5. Type lsblk to list all the block devices
  6. Identify the new block device.
  7. Create a new directory to mount the EBS.
  8. Format the newly mounted storage. The command is mkfs -t ext4 /dev/<device-name>
  9. Mount the EBS on the directory. The command is mount /dev/<device-name>  <mount-dir>
  10. Check for the new storage. The command is df -h

 

 

Programmatic way to reboot EC2 instances

Sometimes we might have to reboot EC2 instances. If the requirement is to restart EC2 instances regularly, we can achieve it by writing a small piece of code. I also came across a similar requirement and a portion of the code I used is given below.

 

Service Nanny in AWS EMR

Service nanny is a service that runs in all the nodes of AWS EMR that controls the operation of daemons in each node.If a process gets killed because of OOM killer or overload etc, it restarts immediately and ensures that the service is alive. This service ensures that the cluster services are always alive without the problems created by unexpected exists in the services. So even if you kill a process or stop a process, it will get automatically restarted.

Recently I faced an issue with impala in AWS EMR. I was getting an error as described in this post. I was using a small  3 node EMR cluster. Instead of creating a new cluster I thought of restarting the impala daemon by specifying the additional arguments. But I was not able to perform this because the service nanny was performing the daemon start before I performing the start. So I stopped the service nanny in all the nodes and restarted impala with extra arguments and then restarted the service nanny.

We can modify service nanny control behavior by editing the config files present in /etc/service-nanny/ directory. You can see config files for each service controlled by service nanny. You can add/remove/modify the control actions  by adding/removing/modifying the config files.

Rhipe on AWS (YARN and MRv1)

Rhipe is an R library that runs on top of hadoop. Rhipe is using hadoop streaming concept for running R programs in hadoop. To know more about Rhipe, please check my older post. My previous post on Rhipe was the basic explanation and the installation steps for running Rhipe in cdh4(MRv1). Now yarn became popular and almost everyone are using YARN. So a lot of people asked me assistance for installing Rhipe in YARN. Rhipe works on yarn very well. Here I am just giving a pointer on how to install Rhipe on AWS (Amazon Web Services). I checked this script and it is working fine. This contains the bootstrap script and installables that installs Rhipe automatically in AWS. For those who are new to AWS, I will explain the basics of AWS EMR and bootstrap script. Amazon Web Services are providing a lot of cloud services. Among that Elastic Mapreduce(EMR) is a service that provides a hadoop cluster. This is one of the best solution for users who don’t want to maintain a data center and don’t want to take the headaches of hadoop administration. AWS is providing a list of components for installing in the hadoop cluster. Those services we can choose while installing the hadoop cluster through the web console. Examples for such components are hive, pig, impala, hue, hbase, hunk etc. But in most of the cases, user may require some extra softwares also. This extra requirement depends on user. If the user try to install the extra service manually in the cluster, it will take lot of time. The automated cluster launch will take less than 10 minutes.( I tried for around 100 nodes). But if you install the software in all of these nodes manually, it will take several hours. For this problem, amazon is providing a solution. User can provide any custom shell scripts and these scripts will be executed on all the nodes while installing the hadoop. This script is called bootstrap script. Here we are installing Rhipe using a bootstrap script. For users who want to install Rhipe on  AWS Hadoop MRv1, you can follow this url. Please ensure that you are using the correct AMI. AMI is Amazon Machine Image. This is just a version of the image that they are providing. For those users who want to install Rhipe on AWS Hadoop MRv2 (YARN), you can follow this url. This will work perfectly on AWS AMI 3.2.1. You can download the github repo in your local and put it your S3. Then launch the cluster by specifying the details mentioned in the installation doc.

For non-aws users

For those users who want to install Rhipe on yarn (Any hadoop cluster), you can either build the Rhipe for their corresponding version of hadoop and put that jar inside Rhipe directory or you can directly try using the ready made rhipe for YARN. All the Rhipe versions are available in a common repository. You can download the installable from this location. You have to follow the steps mentioned in the all the shell scripts present in the given repository. This is a manual activity and you have to do this activity on all the nodes in your hadoop cluster.