September 1, 2014 Leave a comment
In hadoop data is stored as blocks. In every systems, blocks are the basic units. Hadoop block size is larger compared to the disk level and the os level block size, because hadoop is dealing with large data, so if small block size will result in more seek time and more metadata which ultimately results in poor performance. By default the block size in hadoop is 64 MB ( In some distributions, it is 128 MB , eg : Amazon EMR hadoop). We can change the block size based on our requirement. We can change the block size for a singe file , for a set of files or for the entire hdfs.
The property to set the block size is present in hdfs-site.xml. The propery name is dfs.blocksize (dfs.block.size was the old property name, this is deprecated) .
For checking the default block size, we can use a simple command. This will print the default block size of an hdfs client.
This may not be the block size of the files stored in hdfs. If we are specifying a block size for a file while storing, it will be stored with that block size, else it will be stored with default block size.
> hdfs getconf -confKey dfs.blocksize