Programmatic way to reboot EC2 instances

Sometimes we might have to reboot EC2 instances. If the requirement is to restart EC2 instances regularly, we can achieve it by writing a small piece of code. I also came across a similar requirement and a portion of the code I used is given below.



bashrc file not loading automatically

Recently I faced an issue in my CentOS linux machine. When I login to the machine, the bashrc file was not getting loaded and because of this, the environment variables present in the bashrc file was also not getting loaded.

The solution for this issue is given below.

Create a file with the name .profile in the user’s home directory and add the following content to the file.

if [ -f ~/.bashrc ]; then
    source ~/.bashrc

Utility to get the complete details of a Linux system

This is a small shell script that captures almost all the necessary details of a linux system. I tested this script in CentOS and Redhat operating systems. You can access this script directly from github.

How to add EPEL Repository in Linux ?

Linux is my favourite operating system. I like windows for multimedia activities. But when it comes to work and experiments, I like linux. Linux gives us the flexibility to perform all operations and it is a vast ocean to explore. Most of us might have heard about EPEL. We used to download lot of packages from EPEL.

But did anyone knows what is EPEL ??
EPEL stands for Extra Packages for Enterprise Linux. It is an opensource repository maintained by the community which contains lot of useful software packages for Redhat, CentOS and Scientific Linux. We can find packages for almost everything as per our needs from this repository.

  • EPEL repository is 100% opensource and is free to use.
  • No extra effort is required to install these packages.
  • Version specific packages are available depending upon the OS version. So this will not cause any conflicts with existing packages in the OS.
  • Can be simply installed using yum

By default the epel repository will not be added in the linux. We have to add it explicitly. We have to download the epel repo and add it to the repositories. This can be simply done by installing an rpm. The following steps help you in adding the epel repository to your CentOS/Redhat machine.

RHEL/CentOS 7 64-Bit

## RHEL/CentOS 7 64-Bit ##
# wget
# rpm -ivh epel-release-7-5.noarch.rpm

RHEL/CentOS 6 32-Bit

## RHEL/CentOS 6 32-Bit ##
# wget
# rpm -ivh epel-release-6-8.noarch.rpm

RHEL/CentOS 6 64-Bit

## RHEL/CentOS 6 64-Bit ##
# wget
# rpm -ivh epel-release-6-8.noarch.rpm

RHEL/CentOS 5 32-Bit

## RHEL/CentOS 5 32-Bit ##
# wget
# rpm -ivh epel-release-5-4.noarch.rpm

RHEL/CentOS 5 64-Bit

## RHEL/CentOS 5 64-Bit ##
# wget
# rpm -ivh epel-release-5-4.noarch.rpm

RHEL/CentOS 4 32-Bit

## RHEL/CentOS 4 32-Bit ##
# wget
# rpm -ivh epel-release-4-10.noarch.rpm

RHEL/CentOS 4 64-Bit

## RHEL/CentOS 4 64-Bit ##
# wget
# rpm -ivh epel-release-4-10.noarch.rpm

Creating user home directories automatically in linux in case of LDAP

Users can be added to a linux machine either by creating manually or by syncing with an external authentication system such as LDAP. If you are creating users manually, the user home directories will be automatically created. But if you are syncing with an LDAP, the home directories will not be created automatically by default. If you are going to create all the home directories manually, it will be a tedious job, because in most of the cases, there will be hundreds of users. There are some methods to enable auto creation of user home directories.
One method is by using Another method is using oddjob. The method I am gonna discuss here is using oddjob. It is very easy to enable this feature. My operating system is CentOS 6.4. This solution will work with Redhat and CentOS operating systems.
First install oddjob and oddjob-mkhomedir packages.

yum install oddjob oddjob-mkhomedir

Then start the oddjob service. Make this daemon to start automatically on startup.

chkconfig oddjobd on
service oddjobd start

After this we have to update to our authentication mechanism to instruct oddjob to create the user home directories automatically.

authconfig --enablemkhomedir --update

Now we are ready. The user home directories will be created automatically on login.

Service Nanny in AWS EMR

Service nanny is a service that runs in all the nodes of AWS EMR that controls the operation of daemons in each node.If a process gets killed because of OOM killer or overload etc, it restarts immediately and ensures that the service is alive. This service ensures that the cluster services are always alive without the problems created by unexpected exists in the services. So even if you kill a process or stop a process, it will get automatically restarted.

Recently I faced an issue with impala in AWS EMR. I was getting an error as described in this post. I was using a small  3 node EMR cluster. Instead of creating a new cluster I thought of restarting the impala daemon by specifying the additional arguments. But I was not able to perform this because the service nanny was performing the daemon start before I performing the start. So I stopped the service nanny in all the nodes and restarted impala with extra arguments and then restarted the service nanny.

We can modify service nanny control behavior by editing the config files present in /etc/service-nanny/ directory. You can see config files for each service controlled by service nanny. You can add/remove/modify the control actions  by adding/removing/modifying the config files.

Monitoring Tools for Hadoop Clusters

For getting the exact status of server machines, we use monitoring tools. Now a days a lot of monitoring tools are available with good monitoring capabilities and alert capabilities. I was in a search for finding a good monitoring as well as alert tool for hadoop clusters. From my observation, I found some free tools

1) Ganglia
2) Nagios
3) Zabbix
4) Cloudera Manager
6) Apache Ambari

My observations are as follows.

For using Ambari or Cloudera Manager, the cluster should be installed using that tool itself.
That means, we cannot monitor an existing cluster using these tools.
Ganglia provides good matrices and we can capture custom matrices using ganglia. Ganglia is very much flexible.Hadoop comes with a set of configurations that can be used for capturing hadoop matrices using ganglia.These properties can be seen in file. New ganglia web UI is very good and we can export the metrics as csv or json files. This is a very useful feature.But ganglia doesn’t have the alert giving capability such as sending mails in case of issues.Here we can use Nagios. Nagios-Ganglia integration is a good tool for monitoring hadoop clusters. Because we will get good metrics capturing capability as well as alert sending capability.

Ganglia is free. Nagios base version is free. Base version of nagios serves our needs.

Zabbix is also a good tool. A lot of production clusters are running with zabbix as monitoring tool.

Deployment and Management of Hadoop Clusters

Linux Filesystem colour codes

When we fire ls –all in linux cli, files may be listed in different colours  

The color code of the files is as follows:

Blue: Directory file

White: Normal file

Green: Executable file

Yellow: Device file

Magenta: Picture file

Cyan: link file

Red: Compressed file

File Symbol

-(Hyphen) = Normal file


l=link file

b=Block device file

c=character device file