Hadoop supports various compression formats. Snappy is one among the compression formats supported by hadoop. I created a snappy compressed file using the google snappy library and used in hadoop.  But it gave me an error that the file is missing the Snappy identifier. I did a little research on this and found the workaround for that. The method I followed for finding the solution was as follows.
I compressed a file in snappy using the google snappy library and the snappy codecs present in hadoop. I verified the file size and checksum of both the files and found that It is having difference. The compressed file created using hadoop snappy is having some bytes more than that of the compressed file created using google snappy. It is some extra metadata that is consuming the extra bytes.
The code shown below will help you in creating snappy compressed file which will work perfectly in hadoop. This code requires the following dependent jars. This is available in your hadoop installation.
1)  hadoop-common.jar

2) guava-xx.jar

3) log4j.jar

4) commons-collections.jar

5) commons-logging.x.x.x.jar

You can download the code directly from github

package com.snappy.codec;
* @author : Amal G Jose
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.compress.SnappyCodec;
import org.apache.hadoop.util.ReflectionUtils;
*This program compresses the given file in snappy format
public class CreateSnappy {
public static void main(String[] args) {
if (args.length < 2) {
System.out.println("Enter <input> <output>");
try {
CompressionCodec codec = (CompressionCodec) ReflectionUtils
.newInstance(SnappyCodec.class, new Configuration());
OutputStream outStream = codec
.createOutputStream(new BufferedOutputStream(
new FileOutputStream(args[1])));
InputStream inStream = new BufferedInputStream(new FileInputStream(
int readCount = 0;
byte[] buffer = new byte[64 * 1024];
while ((readCount = inStream.read(buffer)) > 0) {
outStream.write(buffer, 0, readCount);
System.out.println("File Compressed");
} catch (Exception e) {

view raw


hosted with ❤ by GitHub