Mapreduce– textinputformat.record.delimiter

Date: April 25, 2013Author: Amal G Jose 1 Comment

The default input format of hadoop mapreduce is text input format. This means it reads text files.

The default delimiter is ‘/n’. This means, it reads line by line.
But reading line by line may not be favourable for us in all the cases. So we can make it read based on our on delimiter rather than the default delimiter ‘/n’.

This can be done by setting a property textinputformat.record.delimiter .

This property can be set either in the program or while running the program in the cli.

The format for setting it in the program (Driver class) is conf.set(“textinputformat.record.delimiter”, “delimiter”) .

One thought on “Mapreduce– textinputformat.record.delimiter”

Add Comment

jeeva ganesan says:

September 16, 2014 at 12:31 am

Hi. I am trying to implement my algorithm as a map function in hadoop map reduce. I have written my algorithm in python and am trying to run it through hadoop streaming, but algorithm requires a particular part of data instead of each line for processing in map function, so i am adding a delimiter in the dataset. If i want to set that delimiter for hadoop, do i still need to write a java program for that?

Reply

All About Tech

Victory goes to the player who makes the next-to-last mistake

Mapreduce– textinputformat.record.delimiter

One thought on “Mapreduce– textinputformat.record.delimiter”

Leave a reply to jeeva ganesan Cancel reply

Share this:

Related

One thought on “Mapreduce– textinputformat.record.delimiter”

Leave a reply to jeeva ganesan Cancel reply