Recently I got a tar.gz file of around 30 GB and on extraction it will become approximately 4TB. I want to speed up the extraction as the normal extraction was taking approximately a day. I searched a lot and finally figured out a solution.
The solution was pigz. This is an advanced version of gzip. It uses multiple threads for reading, writing and checksum calculations. The extraction happens in a single thread. But overall performance is far better than the normal gzip.
The command to install pigz in CentOS or RHEL is given below. Ensure epel repository is enabled in your system
yum install pigz
The command to extract a tar.gz file using pigz is given below.
pigz -dc compressed.tar.gz | tar xf -
If you want to see the progress of the extraction process, you need to use Pipe Viewer (pv). PV (“Pipe Viewer”) is a tool for monitoring the progress of data through a pipeline. It can be inserted into any normal pipeline between two processes to give a visual indication of how quickly data is passing through, how long it has taken, how near to completion it is, and an estimate of how long it will be until completion.
Pipe viewer can be installed in CentOS or RHEL using the following command
yum install pv
Using pv, we can monitor the progress of the decompression process
pigz -dc compressed.tar.gz | pv | tar xf -