Rhipe was first developed by Saptarshi Guha.
Rhipe needs R and Hadoop. So first install R and hadooop. Installation of R and hadoop are well explained in my previous posts. The latest version of Rhipe as of now is Rhipe-0.73.1. and latest available version of R is R-3.0.0. If you are using CDH4 (Cloudera distribution of hadoop) , use Rhipe-0.73 or later versions, because older versions may not work with CDH4.
Rhipe is an R and Hadoop integrated programming environment. Rhipe integrates R and Hadoop. Rhipe is very good for statistical and analytical calculations of very large data. Because here R is integrated with hadoop, so it will process in distributed mode, ie mapreduce.
Futher explainations of Rhipe are available in http://www.datadr.org/
Hadoop, R, protocol buffers and rJava should be installed before installing Rhipe.
We are installing Rhipe in a hadoop cluster. So the job submitted may execute in any of the tasktracker nodes. So we have to install R and Rhipe in all the tasktracker nodes, otherwise you will face an exception “Cannot find R” or something similar to that.
Installing Protocol Buffer
Download the protocol buffer 2.4.1 from the below link
tar -xzvf protobuf-2.4.1.tar.gz cd protobuf-2.4.1 chmod -R 755 protobuf-2.4.1 ./configure make make install
Set the environment variable PKG_CONFIG_PATH
nano /etc/bashrc export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
save and exit
Then executed the following commands to check the installation
pkg-config --modversion protobuf
This will show the version number 2.4.1
pkg-config --libs protobuf
This will display the following things
-pthread -L/usr/local/lib -lprotobuf -lz –lpthread
If these two are working fine, This means that the protobuf is properly installed.
Set the environment variables for hadoop
nano /etc/bashrc export HADOOP_HOME=/usr/lib/hadoop export HADOOP_BIN=/usr/lib/hadoop/bin export HADOOP_CONF_DIR=/etc/hadoop/conf
save and exit
cd /etc/ld.so.conf.d/ nano Protobuf-x86.conf /usr/local/lib # add this value as the content of Protobuf-x86.conf
Save and exit
Download the rJava tarball from the below link.
The latest version of rJava available as of now is rJava_0.9-4.tar.gz
install rJava using the following command
R CMD INSTALL rJava_0.9-4.tar.gz
Rhipe can be downloaded from the following link
R CMD INSTALL Rhipe_0.73.1.tar.gz
This will install Rhipe
After this type R in the terminal
You will enter into R terminal
#This will display
| Please call rhinit() else RHIPE will not run |
#This will display
Rhipe: Detected CDH4 jar files, using RhipeCDH4.jar
Initializing Rhipe v0.73
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/client-0.20/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/client/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Initializing mapfile caches
Now you can execute you Rhipe scripts.