Changing the Replication factor of an existing Hadoop cluster

I have an 8 node cluster. I faced an issue that the storage space was almost occupied and I want to get some more free space. So I had decided to reduce the replication factor from 3 to 2.
For that I editted the dfs.replication property in the hdfs-site.xml of all the nodes and restarted the hdfs. But this will set the replication to 2 only for the newly coming files. So inorder to change the entire existing cluster data to a replication factor to 2, run the following command from the superuser.

hadoop fs -setrep -R -w 2 /

After doing these steps, the entire hdfs data will be replicated twice only.
Similarly you can change the replication factor to any number. đŸ™‚

Advertisements

About amalgjose
I am an Electrical Engineer by qualification, now I am working as a Software Engineer. I am very much interested in Electrical, Electronics, Mechanical and now in Software fields. I like exploring things in these fields. I like travelling, long drives and very much addicted to music.

4 Responses to Changing the Replication factor of an existing Hadoop cluster

  1. Will the HDFS service be available after running the command “hadoop fs -setrep -R -w 2” ? or it will be available only after this process get completes ?

  2. Siddharth says:

    Will the old replicated files get deleted or remain in the hdfs storage..??

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: