Rhipe was first developed by Saptarshi Guha.
Rhipe needs R and Hadoop. So first install R and hadooop. Installation of R and hadoop are well explained in my previous posts. The latest version of Rhipe as of now is Rhipe-0.73.1. and  latest available version of R is R-3.0.0. If you are using CDH4 (Cloudera distribution of hadoop) , use Rhipe-0.73 or later versions, because older versions may not work with CDH4.
Rhipe is an R and Hadoop integrated programming environment. Rhipe integrates R and Hadoop. Rhipe is very good for statistical and analytical calculations of very large data. Because here R is integrated with hadoop, so it will process in distributed mode, ie  mapreduce.
Futher explainations of Rhipe are available in http://www.datadr.org/

Prerequisites

Hadoop, R, protocol buffers and rJava should be installed before installing Rhipe.
We are installing Rhipe in a hadoop cluster. So the job submitted may execute in any of the tasktracker nodes. So we have to install R and Rhipe in all the tasktracker nodes, otherwise you will face an exception “Cannot find R” or something similar to that.

Installing Protocol Buffer

Download the protocol buffer 2.4.1 from the below link

http://protobuf.googlecode.com/files/protobuf-2.4.1.tar.gz

tar -xzvf protobuf-2.4.1.tar.gz

cd protobuf-2.4.1

chmod -R 755 protobuf-2.4.1

./configure

make

make install

Set the environment variable PKG_CONFIG_PATH

nano /etc/bashrc

export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig

save and exit

Then executed the following commands to check the installation

pkg-config --modversion protobuf

This will show the version number 2.4.1
Then execute

pkg-config --libs protobuf

This will display the following things

-pthread -L/usr/local/lib -lprotobuf -lz –lpthread

If these two are working fine, This means that the protobuf is properly installed.

Set the environment variables for hadoop

For example

nano /etc/bashrc

export HADOOP_HOME=/usr/lib/hadoop

export HADOOP_BIN=/usr/lib/hadoop/bin

export HADOOP_CONF_DIR=/etc/hadoop/conf

save and exit

Then


cd /etc/ld.so.conf.d/

nano Protobuf-x86.conf

/usr/local/lib   # add this value as the content of Protobuf-x86.conf

Save and exit

/sbin/ldconfig

Installing rJava

Download the rJava tarball from the below link.

http://cran.r-project.org/web/packages/rJava/index.html

The latest version of rJava available as of now is rJava_0.9-4.tar.gz

install rJava using the following command

R CMD INSTALL rJava_0.9-4.tar.gz

Installing Rhipe

Rhipe can be downloaded from the following link
https://github.com/saptarshiguha/RHIPE/blob/master/code/Rhipe_0.73.1.tar.gz

R CMD INSTALL Rhipe_0.73.1.tar.gz

This will install Rhipe

After this type R in the terminal

You will enter into R terminal

Then type

library(Rhipe)

#This will display

------------------------------------------------

| Please call rhinit() else RHIPE will not run |

————————————————

rhinit()

#This will display

Rhipe: Detected CDH4 jar files, using RhipeCDH4.jar
Initializing Rhipe v0.73
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/client-0.20/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/client/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Initializing mapfile caches

Now you can execute you Rhipe scripts.

Advertisement