Pig – Local and Distributed Execution modes

There are currently two execution environments for pig.

  • Local execution in a single JVM
  • Distributed execution on a Hadoop cluster.

Local mode

In local mode, it uses a single JVM and local file system as execution environments. For running in local mode, we doent need any hadoop cluster. For entering into local execution mode, type the below command in the terminal. The execution type is set using the  -x or  -exectype option. When you type pig -x local,  You can see an output similar below and will enter into the grunt shell. On examining the below INFO logs, you can see that, it is using local file system.


pig –x local

 

2013-07-10 16:46:56,344 [main] INFO  org.apache.pig.Main - Apache Pig version 0.10.0-cdh4.1.2 (rexported) compiled Nov 01 2012, 18:38:58

2013-07-10 16:46:56,345 [main] INFO  org.apache.pig.Main - Logging error messages to: /home/amal_george/pig_1373455016342.log

2013-07-10 16:46:56,500 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at:file:///

grunt>

Distributed Mode

In a pig installed machine, when we type pig in the terminal, it will by default go into distribution execution mode. In distributed mode, the job will run as mapreduce and will use hdfs as file system. So we need a hadoop cluster for run pig in distributed mode.

When we type pig in the terminal. You can see an output similar below and will enter into the grunt shell. On examining the below INFO logs, you can see that, it is connecting to a cluster.

2013-07-10 16:47:52,510 [main] INFO  org.apache.pig.Main - Apache Pig version 0.10.0-cdh4.1.2 (rexported) compiled Nov 01 2012, 18:38:58

2013-07-10 16:47:52,511 [main] INFO  org.apache.pig.Main - Logging error messages to: /home/amal_george/pig_1373455072507.log
2013-07-10 16:47:52,797 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master:9000

2013-07-10 16:47:53,487 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: master:9001

grunt>

Making Custom Auto completion mechanism for Pig

One handy feature of pig’s Grunt shell is completion mechanism, which will try to complete

Pig Latin keywords and functions when you press the Tab key. For example, consider

the following incomplete line:

grunt> a = foreach b ge

If you press the Tab key at this point, ge will expand to generate, a Pig Latin keyword:

grunt> a = foreach b generate

We can customize the completion tokens by putting our necessary tokens in a file named autocomplete and put it in the pig class path or in the directory where we are invoking the grunt shell

For example: I created a file named autocomplete which contains the tokens

Julie

India

Software

Engineer

Hadoop

Bigdata

Then after saving this if u press the corresponding alphabet and press tab, it will display the choices for autocompletion.

Note: The tokens that I mentioned above is not related to pig commands or funtions. It is just for an example only. Like this you can create your own custom scripts or tokens for making the scripting handy