Deployment and Management of Hadoop Clusters

Upgrading Hadoop Clusters

Last day me and my friends tried hadoop cluster upgrade.

We tried two upgrades and both were successful.

One was from cdh3u1 cluster to cdh-4.3.0 and other from cdh 4.1.2 to cdh 4.3.0.
For upgrading we need to upgrade the hadoop installation and the filesystem.
It was a nice experience.

The steps we followed are listed below.

First we checked the filesystem for missing blocks and created the report of the entire filesystem.

From the superuser (hdfs), we executed the command

hadoop dfsadmin –report  > reportold.log

hadoop  fsck  / >  fsckold.log

With this we will get the reports and status of the entire filesystem.

We can keep these for future comparison.

If there are any issues found in the report, do the necessary actions for making it proper.

If everything is fine, we can move futher with our upgrade process.

After this  we stopped all the processes.

For ensuring no accidental data loss, we backed up our namenode and datanode storage.

ie dfs.name.dir and dfs.data.dir.

After that we copied the hadoop configuration files and saved it in a different location for further use.

Then we uninstalled the entire hadoop installation(old version).

Care should be taken for keeping the contents of dfs.name.dir, dfs.data.dir secure.

We created a CDH 4.3.0 local repository and installed CDH 4.3.0 in all the machines similar to old version. The installation steps are mentioned in my previous posts.

Creating A Local YUM Repository

Hadoop Installation

Then we added the configuration files which we copied from the older installation previously.

We pointed the dfs.name.dir and dfs.data.dir to the correct locations.

After doing this, in the namenode machine, execute the following command.

/etc/init.d/hadoop-hdfs-namenode upgrade

Or

service hadoop-hdfs-namenode upgrade

This This will start the namenode and will upgrade the hadoop filesystem to the newer version.

After this, start all the other daemons and check whether everything is working fine or not.

Check the filesystem using the below commands (execute these commands from superuser)

hadoop dfsadmin –report  > reportnew.log

hadoop  fsck  /  >  fscknew.log

Compare the reportnew.log , fscknew.log with reportold.log and fsckold.log.

Note: If we are not satisfied with the upgrade, we can rollback to the previous version. This can be done by uninstalling the newer version and installing the older version  and execting the command

/etc/init.d/hadoop-hdfs-namenode rollback

This can be done only once and cannot do once the upgrade is finalized

If both the reports are same and if there is no problem of missing blocks, we can finalize our upgrade.

Stop all the daemons and execute the following command in the namenode machine

/etc/init.d/hadoop-hdfs-namenode finalizeUpgrade

Once the upgrade is finalized, we cannot rollback.

Note: From our experience we found that cdh4.1.2 filesystem and cdh4.3.0 filesystem are compatable. ie we found cdh-4.3.0 working properly by using the cdh4.1.2’s filesystem without executing the upgrade command.

Back Up Mechanism for Namenode

Namenode is the single point of failure in hadoop cluster. Because it stores the metadata of the entire hadoop system.
So extra care should be given in maintaining it. We use the best hardware for namenode machines.
Even if we use best hardware, complete protection cannot be guarenteed, because hardware issues can happen at anytime. So a backup for namenode is very necessary.
One of the methods is creating a simple backup storage by mounting the partition of another machine located in a different place to the namenode machine.
The back up machine should have the same hardware/software specifications as of namenode machine and installed with hadoop similar to namenode machine. But hadoop services are not started in that machine.
Incase of failure, we can start namenode in this backup machine and it runs like normal namenode. The only thing we need to do is assigning ipaddress/hostname of actual namenode to the backup namenode.

In the hdfs-site.xml, we are giving an additional value to dfs.name.dir property.
ie actual location, backup location.

Eg:

<property>
<name>dfs.name.dir</name>
<value>/app/hadoop/name,/app/hadoop/backup</value>
<description>
Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy. 
</description>
</property>

here /app/hadoop/name is the actual namenode storage location and /app/hadoop/backup is the location where the partition is mounted for storing the namenode backup.
In case of failure of the first namenode machine, the namenode data will be safe in the second machine(backup), so we can start the namenode in the second machine.
The second machine is placed in different location and is provided with a differnt power supply, so that the dependencies of both the machines will be different, thus making an efficient backup.