Hadoop is architectured in such a way that a namenode with few GB RAM can handle Terrabytes of data. Because namenode is storing the metadata only. FSIMAGE and EDITS are the two most important files of name node. FSIMAGE is having snapshot of all the metadata of the Hadoop cluster whereas EDITS contains the incremental information of metadata. When a NameNode starts up, it reads HDFS state from an image file, fsimage, and then applies edits from the edits log file. It then writes new HDFS state to the fsimage and starts normal operation with an empty edits file. Since NameNode merges fsimage and edits files only during start up, the edits log file could get very large over time on a busy cluster. Another side effect of a larger edits file is that next restart of NameNode takes longer. Inorder to avoid these issues, we have secondary namenode.
The function of secondary NameNode is to merge the fsimage and the edits log files periodically and keeps edits log size within a limit. It is usually run on a different machine than the primary NameNode since its memory requirements are on the same order as the primary NameNode.
Below are the messages exchange between primary name node and secondary name node.
1.Secondary name node sends a message to primary name node to start writing the incremental messages in EDITS.NEW file.
2.Secondary name node copies the FSIMAGE and EDITS file from the primary name node.
3.Secondary name node adds the EDITS to the FSIMAGE and make a new FSIMAGE file.
4.Sends it to the primary name node.
5.Primary name node renames the EDITS.NEW file to EDITS.