Changing the data type mapping in sqoop

Sqoop is very helpful in importing data from RDBMS to hadoop. The hive import feature will create a hive table corresponding to the RDBMS table and import the data. By default sqoop creates a hive table based on the predefined data type conversion logic build inside sqoop. We have an option to change the default conversion. We can explicitly specify the data type required in  the hive table. This is possible by adding an extra option as below.

--map-column-java <mapping> Override mapping from SQL to Java type for configured columns.
--map-column-hive <mapping> Override mapping from SQL to Hive type for configured columns.

For example. If we have a field called ‘id‘ in an sql table which is of integer data type and we want it as a string data type column in hive, we can do the following step.

sqoop import ... --map-column-hive id=string

Compressing the output of sqoop

The output of a sqoop job can be compressed directly. Sqoop job is a mapreduce job, so by setting the mapreduce output compression codec, we can get the output of sqoop compressed. It is very simple, just put an argument to the sqoop command string.

--compression-codec <compression codec>

For snappy compressed output the argument will be as below.

--compression-codec org.apache.hadoop.io.compress.SnappyCodec

For Gzip compression

--compression-codec org.apache.hadoop.io.compress.GzipCodec

For Bzip compression

--compression-codec org.apache.hadoop.io.compress.BZip2Codec