Simple Tag Cloud Generation Using Java program

A Tag cloud is a visual representation of text data. In this tags are words, where the importance is highlighted using colour or font size. This is very popular now to analyse contents of websites. This helps in quickly perceiving the most important words. The importance is calculated by counting the number of occurance. Thus based on occurance, weightage is given to each word(tag). After analysing the whole text, it is displayed based on it weightage. Thus tag cloud will be generated. open cloud is a java library for generating tag clouds. Here I used Open cloud library for the generation of Tag cloud. Normally we need a webserver for getting a good UI of the TagCloud, here we are displaying the cloud using Swing. This is a sample program for the generation of a simple tag Cloud. For this download the Open Cloud Library.

package tagcloud;

import java.util.Random;

import javax.swing.JFrame;
import javax.swing.JLabel;
import javax.swing.JPanel;
import javax.swing.SwingUtilities;

import org.mcavallo.opencloud.Cloud;
import org.mcavallo.opencloud.Tag;

public class TestOpenCloud {

private static final String[] WORDS = { "amal", "india", "hello", "amal", "birthday", "amal", "hello", "california", "america", "software",
 "cat", "bike", "car", "christmas", "city", "zoo", "amal", "asia", "family", "festival", "flower", "flowers", "food",
 "little", "friends", "fun", "amal", "outing", "india", "weekend", "india", "software", "me", "music", "music", "music",
 "new", "love", "night", "nikon", "morning", "love", "park", "software", "people", "portrait", "flower", "sky", "travelling",
 "spain", "summer", "sunset", "india", "city", "india", "amal", "uk", "usa", "", "water", "wedding","cool","happy","friends","best","trust","good",
 "enjoy","cry","laugh"};

protected void initUI() {
 JFrame frame = new JFrame(TestOpenCloud.class.getSimpleName());
 frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
 JPanel panel = new JPanel();
 Cloud cloud = new Cloud();
 Random random = new Random();
 for (String s : WORDS) {
 for (int i = random.nextInt(50); i > 0; i--) {
 cloud.addTag(s);
 }
 }
 for (Tag tag : cloud.tags()) {
 final JLabel label = new JLabel(tag.getName());
 label.setOpaque(false);
 label.setFont(label.getFont().deriveFont((float) tag.getWeight() * 10));
 panel.add(label);
 }
 frame.add(panel);
 frame.setSize(800, 600);
 frame.setVisible(true);
 }

public static void main(String[] args) {
 SwingUtilities.invokeLater(new Runnable() {
 @Override
 public void run() {
 new TestOpenCloud().initUI();
 }
 });
 }

}

Accessing Facebook using Java Program.

Here is a simple java application for accessing facebook. This is done by using restfb, which is a  simple and flexible Facebook Graph API and Old REST API client written in Java.

For this we need an access token which we can generate using graph API explorer in facebook developers website.

Download the restfb jar.

Get an access token with necessary permissions.

package com.amal.fb;

import com.restfb.DefaultFacebookClient;
import com.restfb.FacebookClient;
import com.restfb.Parameter;
import com.restfb.types.FacebookType;
import com.restfb.types.User;

public class FacebookAppnew {
public static void main(String[] args) {
 System.setProperty("https.proxyHost", "PROXY");
 System.setProperty("https.proxyPort", "PORT");
 FacebookClient fbClient = new DefaultFacebookClient("MY_ACCESS_TOKEN");
 User user = fbClient.fetchObject("me", User.class);
 // Getting the Details from FB
 System.out.println("UserName :"+user.getName());
 System.out.println("Birthday :"+user.getBirthday());
 System.out.println("Bio :"+user.getBio());
 System.out.println("Email :"+user.getEmail());

 //Making a post in Facebook
 FacebookType publishMessageResponse =
 fbClient.publish("me/feed", FacebookType.class,
 Parameter.with("message", "Good Evening"));

}
}

Similarly we can post photos, videos etc.

Fetching a Webpage using Java

This is a simple java code to fetch a webpage.


public class Test {

public static void main(String[] args) {
 URL url;
 InputStream is = null;
 BufferedReader br;
 String line;

try {
 url = new URL("http://www.wikipedia.org/");
 is = url.openStream(); // throws an IOException
 br = new BufferedReader(new InputStreamReader(is));

while ((line = br.readLine()) != null) {
 System.out.println(line);
}
 } catch (MalformedURLException mue) {
 mue.printStackTrace();
 } catch (IOException ioe) {
 ioe.printStackTrace();
 } finally {
 try {
 is.close();
 } catch (IOException ioe) {
 // nothing to see here
 }
 }
 }
 }

Mapreduce– textinputformat.record.delimiter

The default input format of hadoop mapreduce is text input format. This means it reads text files.

The default delimiter is ‘/n’. This means, it reads line by line.
But reading line by line may not be favourable for us in all the cases. So we can make it read based on our on delimiter rather than the default delimiter ‘/n’.

This can be done by setting a property textinputformat.record.delimiter .

This property can be set either in the program or while running the program in the cli.

The format for setting it in the program (Driver class)  is  conf.set(“textinputformat.record.delimiter”, “delimiter”) .

A Simple Mapreduce Program – Wordcount

Hello world is the trial program for almost all programming languages. Like that for hadoop-mapreduce, the trial program is wordcount, which is the basic simple mapreduce program. This program helps us in getting a good understanding of parallel processing of hadoop.

It consists of three classes.

1)      Driver class- which is the main class

2)      Mapper class- which does the map functions

3)      Reducer class- which does the reduce functions

 

Driver Class

 

import java.io.IOException;
import java.util.Date;
import java.util.Formatter;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCountDriver {

public static void main(String[] args) throws IOException,
InterruptedException, ClassNotFoundException {
Configuration conf = new Configuration();
GenericOptionsParser parser = new GenericOptionsParser(conf, args);
args = parser.getRemainingArgs();

Job job = new Job(conf, "wordcount");

job.setJarByClass(WordCountDriver.class);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

Formatter formatter = new Formatter();
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);

System.out.println(job.waitForCompletion(true));
}
}

 

Mapper Class

 

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private Text word = new Text();
private final static IntWritable one = new IntWritable(1);
 
protected void map(LongWritable key, Text value, Context context)
 throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
 word.set(tokenizer.nextToken());
 context.write(word, one);
}
}
}

 

Reducer Class

 

import java.io.IOException;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
 
public class WordCountReducer extends
 Reducer<Text, IntWritable, Text, IntWritable> {
 protected void reduce(Text key, Iterable<IntWritable> values,
 Context context) throws IOException, InterruptedException {
 int sum = 0;
 for (IntWritable value : values) {
 sum += value.get();
 }
 context.write(key, new IntWritable(sum));
 }
}

The jars necessary for this code is taken from the same version of hadoop package which is installed in the cluster. If the version is different, then it will result in error.
Here the mapper class reads the input file line by line. Then inside the mapper class, we convert the line to string after that we tokenize it into words. ie each line is split into individual words. The output of the mapper class is given to the reducer. the output of the mapper is in the form of a pair.
The context.write method actually gives a key-value pair to the reducer. Here the key is the word and value is “one” which is a variable assigned to the value 1.

In the Reducer, we merges these words and counts the values attached to similar words.
For example if we give an input file as

Hi all I am fine
Hi all I am good
hello world

After Mapper class the output will be,

Hi 1
all 1
I 1
am 1
fine 1
Hi 1
all 1
I 1
am 1
good 1
hello 1
world 1

After Reducer class the output will be

Hi 2
all 2
I 2
am 2
fine 1
good 1
hello 1
world 1

Processes started and running fine, but running processes are not listing in jps

Today I met a strange issue while installing hadoop on a linux machine.

I started all the processes, it was working fine also, but those processes were not listed in jps. I struggled with issue for some time, later I found the solution. The reason was because, some files like hsperfdata_<username> were present in /tmp folder. I deleted those files, after that jps listed all the running processes.. 🙂

Actually this hsperfdata_ is a feature Not bug. It’s a log directory created by jvm while running and its a part of Java performance counter. This folder created inside tmp folder of any operating system by default.
JVM uses this folder for process monitoring.
This folder contains perfdata corresponding to java process id running from user mentioned in hsperf Data folder name.