Program to compress a file in Snappy format

Hadoop supports various compression formats. Snappy is one among the compression formats supported by hadoop. I created a snappy compressed file using the google snappy library and used in hadoop.  But it gave me an error that the file is missing the Snappy identifier. I did a little research on this and found the workaround for that. The method I followed for finding the solution was as follows.
I compressed a file in snappy using the google snappy library and the snappy codecs present in hadoop. I verified the file size and checksum of both the files and found that It is having difference. The compressed file created using hadoop snappy is having some bytes more than that of the compressed file created using google snappy. It is some extra metadata that is consuming the extra bytes.
The code shown below will help you in creating snappy compressed file which will work perfectly in hadoop. This code requires the following dependent jars. This is available in your hadoop installation.
1)  hadoop-common.jar

2) guava-xx.jar

3) log4j.jar

4) commons-collections.jar

5) commons-logging.x.x.x.jar

You can download the code directly from github

Advertisements

About amalgjose
I am an Electrical Engineer by qualification, now I am working as a Software Engineer. I am very much interested in Electrical, Electronics, Mechanical and now in Software fields. I like exploring things in these fields. I like travelling, long drives and very much addicted to music.

7 Responses to Program to compress a file in Snappy format

  1. Shashu says:

    Hi Amal
    Is there any hadoop command to convert input data into snappy compressed format?

  2. sharath.abhishek@gmail.com says:

    When i run the program i see this error. Any idea on this?

    Exception in thread “main” java.lang.NoSuchMethodError: org.apache.hadoop.io.compress.CodecPool.getCompressor(Lorg/apache/hadoop/io/compress/CompressionCodec;Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/io/compress/Compressor;
    at org.apache.hadoop.io.compress.CompressionCodec$Util.createOutputStreamWithCodecPool(CompressionCodec.java:131)
    at org.apache.hadoop.io.compress.SnappyCodec.createOutputStream(SnappyCodec.java:98)
    at hive.HiveJdbcClient.main(HiveJdbcClient.java:34)

  3. DMP Falcon says:

    can you confirm the versions of the jar files which used? I’m getting below error,
    log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.NativeCodeLoader).
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
    java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.
    org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)
    org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:134)
    org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:150)
    org.apache.hadoop.io.compress.CompressionCodec$Util.createOutputStreamWithCodecPool(CompressionCodec.java:131)
    org.apache.hadoop.io.compress.SnappyCodec.createOutputStream(SnappyCodec.java:100)
    com.snappy.codec.CreateSnappy.main(CreateSnappy.java:35)
    Do you have any idea on this?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: