Advertisements

How to extract a tar.gz file quickly in Linux

Recently I got a tar.gz file or around 30 GB and on extraction it will become approximately 4TB. I want to speed up the extraction as the normal extraction was taking approximately a day. I searched a lot and finally figured out a solution.

The solution was pigz. This is an advanced version of gzip. It uses multiple threads for reading, writing and checksum calculations. The extraction happens in a single thread. But overall performance is far better than the normal gzip.

The command to install pigz in CentOS or RHEL is given below. Ensure epel repository is enabled in your system

yum install pigz

The command to extract a tar.gz file using pigz is given below.

pigz -dc compressed.tar.gz | tar xf -

If you want to see the progress of the extraction process, you need to use Pipe Viewer (pv). PV (“Pipe Viewer”) is a tool for monitoring the progress of data through a pipeline. It can be inserted into any normal pipeline between two processes to give a visual indication of how quickly data is passing through, how long it has taken, how near to completion it is, and an estimate of how long it will be until completion.

Pipe viewer can be installed in CentOS or RHEL using the following command

yum install pv

Using pv, we can monitor the progress of the decompression process

pigz -dc compressed.tar.gz | pv | tar xf -

 

Advertisements

Compressing the output of sqoop

The output of a sqoop job can be compressed directly. Sqoop job is a mapreduce job, so by setting the mapreduce output compression codec, we can get the output of sqoop compressed. It is very simple, just put an argument to the sqoop command string.

--compression-codec <compression codec>

For snappy compressed output the argument will be as below.

--compression-codec org.apache.hadoop.io.compress.SnappyCodec

For Gzip compression

--compression-codec org.apache.hadoop.io.compress.GzipCodec

For Bzip compression

--compression-codec org.apache.hadoop.io.compress.BZip2Codec