Installing Cloudera Manager in an existing hadoop cluster

Cloudera Manager is an Infrastructure management and monitoring tool provided by cloudera. This has now became a very excellent tool to manage bigdata infrastructure. The pain of administrators has been reduced by 80% with this cloudera manager. Almost everything required for an administrator is integrated into this great software and is very user friendly. Cloudera Manager became this muhc powerful recently. So lot of existing clusters are still running without using cloudera manager. If you want to manage an existing cluster using cloudera manager, the following steps may help you. For this you have to completely uninstall the existing hadoop set up. No data loss will happen because we are not touching any data. The configurations also will remain the same. These are just pointers.

1) Stop all the services
2) Back up hive metastore, Namenode metadata and all the other required metastores (Eg hue, oozie)
3) Back up all the configurations
4) Note down the existing storage directories
5) Uninstall all the hadoop services (Never touch the data)
6) Install Cloudera Manager Server and Agent
7) Install all the services (It should be same version as that of previous to make installation smoother)
8) Add the configurations (Use the same configurations as that of previous. There is an option to add xml configs in CM)
9) Point the storage directories in the cloudera manager configurations.
10) Point the new installation to the existing metastore (hive, oozie, hue etc)
11) Start all the services (Don’t format the namenode)
12) Test the cluster
Advertisements

Migrating Namenode from one host to another host

Namenode is the heart of the hadoop cluster. So namenode will be installed in a good quality machine compared to the other nodes. If we want to migrate namenode from one node to another node, the following steps are required. This is a rare scenario.

Manual Approach

Method 1: (By migrating the harddrive)

  • Stop all the running jobs in the cluster
  • Enter into Namenode Safe
    • hdfs dfsadmin -safemode enter
  • Execute the following command to save the currrent namespace to the storage directories and reset editlogs..
    • hdfs dfsadmin -saveNamespace
  • Stop the entire cluster
  • Remove the hard disk from the old namenode host and attach it to the new namenode host
  • Release the ipaddress from the old namenode host and assign it to the new namenode host
  • Start the new namenode (DO NOT PERFORM FORMAT)
  • Start all the services

Method 2: (New Harddrive)

  • Stop all the running jobs in the cluster
  • Enter into Namenode Safe
    • hdfs dfsadmin -safemode enter
  • Execute the following command to save the currrent namespace to the storage directories and reset editlogs..
    • hdfs dfsadmin -saveNamespace
  • Stop the entire cluster
  • Login to the namenode host.
  • Navigate to the namenode storage directories.
  • Copy the namenode metadata. Always better to keep this as a compressed file. Notedown the folder and file permissions & ownership.
  • Take a back up of the configuration files.
  • Install namenode of the same version as that of the existing system to the new machine.
  • Ensure that the ipaddress of the old host is taken and assigned to the new host.
  • Copy the configuration files and metadata to the new namenode host
  • Create namenode storage directory structure in the new host.
  • Maintain the same folder permissions and ownership in the new host also.
  • If there are any changes in namenode directory structure, make the corresponding changes in config files.
  • Incase of a kerberised cluster, create appropriate principles for the new host and place the proper keytabs.
  • Start the new namenode. (DO NOT PERFORM FORMAT)
  • Start the remaining services.
  • Test the working of the cluster by executing file system operations as well as MR operations.

Automated Approach in a cluster managed using Cloudera Manager (CM above 5.4)

If you are using cloudera manager 5.4 or above, there is a new feature known as Namenode Role Migration that helps us to migrate namenode from one host to another. This requires HDFS HA to be enabled.

Creating user home directories automatically in linux in case of LDAP

Users can be added to a linux machine either by creating manually or by syncing with an external authentication system such as LDAP. If you are creating users manually, the user home directories will be automatically created. But if you are syncing with an LDAP, the home directories will not be created automatically by default. If you are going to create all the home directories manually, it will be a tedious job, because in most of the cases, there will be hundreds of users. There are some methods to enable auto creation of user home directories.
One method is by using pam_mkhomedir.so. Another method is using oddjob. The method I am gonna discuss here is using oddjob. It is very easy to enable this feature. My operating system is CentOS 6.4. This solution will work with Redhat and CentOS operating systems.
First install oddjob and oddjob-mkhomedir packages.

yum install oddjob oddjob-mkhomedir

Then start the oddjob service. Make this daemon to start automatically on startup.

chkconfig oddjobd on
service oddjobd start

After this we have to update to our authentication mechanism to instruct oddjob to create the user home directories automatically.

authconfig --enablemkhomedir --update

Now we are ready. The user home directories will be created automatically on login.

How to Change the Hostname of Ubuntu Server ?

We can change the hostname of a machine by several ways. I am mentioning one way to change the hostname of Ubuntu server.

  1. Open the terminal or Login with putty.exe as root user (if you are working remotely)
  2. Goto /etc/
  3. Type nano hostname
  4. Change the HOSTNAME to your preferred machine name
  5. Press Cntrl+X
  6. Save the configuration by pressing Y
  7. Log off or reboot

Note: Add the new hostname and ipaddress in the /etc/hosts file also

How to Change the Hostname of CentOS or RedHat Linux systems?

We can change the hostname of a machine by several ways. I am mentioning two ways to change the hostname.

Method 1 :- Editting /etc/sysconfig/network file

  1. Open the terminal or Login with putty.exe as root user (if you are working remotely)
  2. Goto /etc/sysconfig/
  3. Type nano network
  4. Change the HOSTNAME to your preferred machine name
  5. Press Cntrl+X
  6. Save the configuration by pressing Y
  7. Log off or reboot

Method 2:- Editting  /proc/sys/kernel/hostname file

  1. Open the terminal or Login with putty.exe as root user (if you are working remotely)
  2. Goto /proc/sys/kernel/
  3. Type nano hostname
  4. Change the HOSTNAME to your preferred machine name
  5. Press Cntrl+X
  6. Save the configuration by pressing Y
  7. Close the terminal and login again

In this method, no reboot is required to get the change in effect

Note: Add the ipaddress and new hostname to /etc/hosts file also