How to convert or change the data type of columns in Pandas dataframe ?

Changing the datatype of columns in pandas dataframe is very easy. Here I am using stype() function to perform the typecase operation.  Refer to the following example. The type conversion is happening in the line number 10 of the code.

 

You can add as many columns as you want to convert the data type or typecast. For example if you want to typecast the columns emp_id and salary, use the following syntax.

> df = df.astype({‘salary’:‘int’, ’emp_id’:’int’})

 

Changing the data type mapping in sqoop

Sqoop is very helpful in importing data from RDBMS to hadoop. The hive import feature will create a hive table corresponding to the RDBMS table and import the data. By default sqoop creates a hive table based on the predefined data type conversion logic build inside sqoop. We have an option to change the default conversion. We can explicitly specify the data type required in  the hive table. This is possible by adding an extra option as below.

--map-column-java <mapping> Override mapping from SQL to Java type for configured columns.
--map-column-hive <mapping> Override mapping from SQL to Hive type for configured columns.

For example. If we have a field called ‘id‘ in an sql table which is of integer data type and we want it as a string data type column in hive, we can do the following step.

sqoop import ... --map-column-hive id=string