Simple Word Count using Apache Pig

Date: May 31, 2014Author: Amal G Jose 2 Comments

The basic hello world program in hadoop is the word count program. I have explained the word count implementation using java mapreduce and hive queries in my previous posts. Here I am explaining the implementation of basic word count logic using pig script.
TOKENIZE is a build in function available in apache pig which tokenizes a line into words.
Similarly FLATTEN and COUNT are also built-in functions available in apache pig.
You can check the output or flow of each step by using DUMP command after every step.
The below pig scripts will do the count of words in the input file.

A = load 'data.txt' as (line:chararray);
B = foreach A generate TOKENIZE(line) as tokens;
C = foreach B generate flatten(tokens) as words;
D = group C by words;
E = foreach D generate group, COUNT(C);
F = order E by $1;
dump F;

2 thoughts on “Simple Word Count using Apache Pig”

Add Comment

Ankit Gupta says:

April 22, 2016 at 9:02 pm

Hello, I am looking for more help in understanding as how to do the same for character count in PIG.

Reply
Ramesh Khade says:

July 1, 2016 at 6:41 am

Thanks for sharing this information

Reply

All About Tech

Victory goes to the player who makes the next-to-last mistake

Simple Word Count using Apache Pig

2 thoughts on “Simple Word Count using Apache Pig”

Leave a comment Cancel reply

Share this:

Related

2 thoughts on “Simple Word Count using Apache Pig”

Leave a comment Cancel reply