Program to compress a file in Snappy format

Hadoop supports various compression formats. Snappy is one among the compression formats supported by hadoop. I created a snappy compressed file using the google snappy library and used in hadoop.  But it gave me an error that the file is missing the Snappy identifier. I did a little research on this and found the workaround for that. The method I followed for finding the solution was as follows.
I compressed a file in snappy using the google snappy library and the snappy codecs present in hadoop. I verified the file size and checksum of both the files and found that It is having difference. The compressed file created using hadoop snappy is having some bytes more than that of the compressed file created using google snappy. It is some extra metadata that is consuming the extra bytes.
The code shown below will help you in creating snappy compressed file which will work perfectly in hadoop. This code requires the following dependent jars. This is available in your hadoop installation.
1)  hadoop-common.jar

2) guava-xx.jar

3) log4j.jar

4) commons-collections.jar

5) commons-logging.x.x.x.jar

You can download the code directly from github

About amalgjose
I am an Electrical Engineer by qualification, now I am working as a Software Architect. I am very much interested in Electrical, Electronics, Mechanical and now in Software fields. I like exploring things in these fields. I love travelling, long drives and music.

9 Responses to Program to compress a file in Snappy format

  1. Shashu says:

    Hi Amal
    Is there any hadoop command to convert input data into snappy compressed format?

  2. sharath.abhishek@gmail.com says:

    When i run the program i see this error. Any idea on this?

    Exception in thread “main” java.lang.NoSuchMethodError: org.apache.hadoop.io.compress.CodecPool.getCompressor(Lorg/apache/hadoop/io/compress/CompressionCodec;Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/io/compress/Compressor;
    at org.apache.hadoop.io.compress.CompressionCodec$Util.createOutputStreamWithCodecPool(CompressionCodec.java:131)
    at org.apache.hadoop.io.compress.SnappyCodec.createOutputStream(SnappyCodec.java:98)
    at hive.HiveJdbcClient.main(HiveJdbcClient.java:34)

  3. Xavier says:

    Amal, I am new to Hadoop. The above code help me to compress the file in local file system. I want something to do the same in HDFS. Could you please get me the code snippet.

  4. DMP Falcon says:

    can you confirm the versions of the jar files which used? I’m getting below error,
    log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.NativeCodeLoader).
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
    java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.
    org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)
    org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:134)
    org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:150)
    org.apache.hadoop.io.compress.CompressionCodec$Util.createOutputStreamWithCodecPool(CompressionCodec.java:131)
    org.apache.hadoop.io.compress.SnappyCodec.createOutputStream(SnappyCodec.java:100)
    com.snappy.codec.CreateSnappy.main(CreateSnappy.java:35)
    Do you have any idea on this?

  5. avi says:

    Hi Amol,
    Could you please explain in details what is snappy exactly.
    How it will improve performance and also will it compress jar’s

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: