Checksum calculation is an unavoidable and very important step in places where we transfer files/data. The simplest way to ensure whether a file reached the destination properly or not is by comparing the checksum of source and target files. Checksum can be calculated in several ways. One is by calculating the checksum by keeping the entire file as a single block. Another way is multipart checksum calculation, where we calculate the checksum of multiple small chunks in the file and finally calculating the aggregated checksum.
Here I am explaining about the calculation of checksum of a file using the simplest way. I am using the hashlib library in python for calculating the checksum.
Suppose I have a zip file located in the location /home/coder/data.zip. The checksum of the file can be calculated as follows.
import hashlib file_name = ‘/home/amal/data.zip’ checksum = hashlib.md5(open(file_name).read()).hexdigest() print checksum
One common mistake I have seen among people is passing the file name directly without opening the file
This will also return a checksum. But it will be calculating the checksum of the file name, not the checksum calculated based on the contents of the file. So always use the checksum calculation as follows
This will return the exact checksum.
In linux, you can calculate the md5sum using a commandline utility also.
> md5sum file_name