I have an 8 node cluster. I faced an issue that the storage space was almost occupied and I want to get some more free space. So I had decided to reduce the replication factor from 3 to 2.
For that I editted the dfs.replication property in the hdfs-site.xml of all the nodes and restarted the hdfs. But this will set the replication to 2 only for the newly coming files. So inorder to change the entire existing cluster data to a replication factor to 2, run the following command from the superuser.
hadoop fs -setrep -R -w 2 /
After doing these steps, the entire hdfs data will be replicated twice only.
Similarly you can change the replication factor to any number. 🙂
Will the HDFS service be available after running the command “hadoop fs -setrep -R -w 2” ? or it will be available only after this process get completes ?
This will not create any downtime to HDFS.
Will the old replicated files get deleted or remain in the hdfs storage..??
Yes. The files will be deleted after sometime.