Advertisements

Changing the Replication factor of an existing Hadoop cluster

I have an 8 node cluster. I faced an issue that the storage space was almost occupied and I want to get some more free space. So I had decided to reduce the replication factor from 3 to 2.
For that I editted the dfs.replication property in the hdfs-site.xml of all the nodes and restarted the hdfs. But this will set the replication to 2 only for the newly coming files. So inorder to change the entire existing cluster data to a replication factor to 2, run the following command from the superuser.

hadoop fs -setrep -R -w 2 /

After doing these steps, the entire hdfs data will be replicated twice only.
Similarly you can change the replication factor to any number. 🙂