HDFS Block Size

In hadoop data is stored as blocks. In every systems, blocks are the basic units. Hadoop block size is larger compared to the disk level and the os level block size, because hadoop is dealing with large data, so if small block size will result in more seek time and more metadata which ultimately results in poor performance. By default the block size in hadoop is 64 MB ( In some distributions, it is 128 MB , eg : Amazon EMR hadoop). We can change the block size based on our requirement. We can change the block size for a singe file , for a set of files or for the entire hdfs.

The property to set the block size is present in hdfs-site.xml. The propery name is dfs.blocksize (dfs.block.size was the old property name, this is deprecated) .

For checking the default  block size, we can use a simple command. This will print the default block size of an hdfs client.

This may not be the block size of the files stored in hdfs. If we are specifying a block size for a file while storing, it will be stored with that block size, else it will be stored with default block size.

> hdfs getconf -confKey dfs.blocksize

About amalgjose
I am an Electrical Engineer by qualification, now I am working as a Software Architect. I am very much interested in Electrical, Electronics, Mechanical and now in Software fields. I like exploring things in these fields. I love travelling, long drives and music.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: