Bubble chart using Python

Bubble chart is one of the powerful and useful chart for representing data with three or four dimensions.

The position of the bubble is determined by the x & y axis values. These are the first two properties.

The size of the bubble can be controlled by the third property.

The colour of the bubble can be controlled by the fourth property.

A Sample program to create a bubble chart using the python library matplotlib is given below.

import matplotlib.pyplot as plot
import numpy as npy

# create some dummy data using numpy random function.
# Bubble charts are used to represent data with three or four dimensions.
# X axis can represent one property, Y can represent another property,
# The bubble size can represent another properly, the color of the bubble can represent another property.

x = npy.random.rand(50)
y = npy.random.rand(50)
z = npy.random.rand(50)
colors = npy.random.rand(50)
# use the scatter function
plot.scatter(x, y, s=z * 1000, c=colors)
plot.show()

Here we are generating some random data using numpy and plotting the bubble chart using matplotlib.

A sample output is given below.

bubblechart

Bubble Chart using Python

 

R and Big Data

Now R programming is getting more attention among people. The reason I found was that it can be used efficiently for big data analytics. R is a good statistical tool. Its applicability in big data analytics is very much. Now the system is trying to learn from data or else we are trying to teach the system using data. With advanced analytics with R programming, it is very easy to generate insights from large data. Now a lot of packages are available for R that makes it powerful and capable to work on top of latest Big data technologies. Some of the libraries that I have noticed are listed below.

1) Rhipe: RHIPE (hree-pay’) is the R and Hadoop Integrated Programming Environment.
For more details Rhipe

2) Rhive : RHive is an R extension facilitating distributed computing via Apache Hive.
For more details Rhive

3) Rhbase : This R package provides basic connectivity to HBASE, using the Thrift server. R programmers can browse, read, write, and modify tables stored in HBASE.
For more details Rhbase

4) Rhdfs : This R package provides basic connectivity to the Hadoop Distributed File System. R programmers can browse, read, write, and modify files stored in HDFS.
For more details Rhdfs

5) Rmr : This R package allows an R programmer to perform statistical analysis via MapReduce on a Hadoop cluster.
For more details Rmr

6) Plyrmr : This R package enables the R user to perform common data manipulation operations, as found in popular packages such as plyr and reshape2, on very large data sets stored on Hadoop. Like rmr, it relies on Hadoop mapreduce to perform its tasks, but it provides a familiar plyr-like interface while hiding many of the mapreduce details.
For more details Plyrmr

7) Rmongo : MongoDB Database interface for R. The interface is provided via Java calls to the mongo-java-driver.
For more details Rmongo