Changing the Replication factor of an existing Hadoop cluster

I have an 8 node cluster. I faced an issue that the storage space was almost occupied and I want to get some more free space. So I had decided to reduce the replication factor from 3 to 2.
For that I editted the dfs.replication property in the hdfs-site.xml of all the nodes and restarted the hdfs. But this will set the replication to 2 only for the newly coming files. So inorder to change the entire existing cluster data to a replication factor to 2, run the following command from the superuser.

hadoop fs -setrep -R -w 2 /

After doing these steps, the entire hdfs data will be replicated twice only.
Similarly you can change the replication factor to any number. 🙂


Hadoop commands in hive command line interface

We can execute hadoop commands in hive cli. It is very simple.
Just put an exclamation mark (!) before your hadoop command in hive cli and put a semicolon (;) after your command.


hive> !hadoop fs –ls / ;

drwxr-xr-x   - hdfs supergroup          0 2013-03-20 12:44 /app
drwxrwxrwx   - hdfs supergroup          0 2013-05-23 11:54 /tmp
drwxr-xr-x   - hdfs supergroup          0 2013-05-08 18:47 /user

Very simple.. 🙂