Rhipe on AWS (YARN and MRv1)

Rhipe is an R library that runs on top of hadoop. Rhipe is using hadoop streaming concept for running R programs in hadoop. To know more about Rhipe, please check my older post. My previous post on Rhipe was the basic explanation and the installation steps for running Rhipe in cdh4(MRv1). Now yarn became popular and almost everyone are using YARN. So a lot of people asked me assistance for installing Rhipe in YARN. Rhipe works on yarn very well. Here I am just giving a pointer on how to install Rhipe on AWS (Amazon Web Services). I checked this script and it is working fine. This contains the bootstrap script and installables that installs Rhipe automatically in AWS. For those who are new to AWS, I will explain the basics of AWS EMR and bootstrap script. Amazon Web Services are providing a lot of cloud services. Among that Elastic Mapreduce(EMR) is a service that provides a hadoop cluster. This is one of the best solution for users who don’t want to maintain a data center and don’t want to take the headaches of hadoop administration. AWS is providing a list of components for installing in the hadoop cluster. Those services we can choose while installing the hadoop cluster through the web console. Examples for such components are hive, pig, impala, hue, hbase, hunk etc. But in most of the cases, user may require some extra softwares also. This extra requirement depends on user. If the user try to install the extra service manually in the cluster, it will take lot of time. The automated cluster launch will take less than 10 minutes.( I tried for around 100 nodes). But if you install the software in all of these nodes manually, it will take several hours. For this problem, amazon is providing a solution. User can provide any custom shell scripts and these scripts will be executed on all the nodes while installing the hadoop. This script is called bootstrap script. Here we are installing Rhipe using a bootstrap script. For users who want to install Rhipe on ¬†AWS Hadoop MRv1, you can follow this url. Please ensure that you are using the correct AMI. AMI is Amazon Machine Image. This is just a version of the image that they are providing. For those users who want to install Rhipe on AWS Hadoop MRv2 (YARN), you can follow this url. This will work perfectly on AWS AMI 3.2.1. You can download the github repo in your local and put it your S3. Then launch the cluster by specifying the details mentioned in the installation doc.

For non-aws users

For those users who want to install Rhipe on yarn (Any hadoop cluster), you can either build the Rhipe for their corresponding version of hadoop and put that jar inside Rhipe directory or you can directly try using the ready made rhipe for YARN. All the Rhipe versions are available in a common repository. You can download the installable from this location. You have to follow the steps mentioned in the all the shell scripts present in the given repository. This is a manual activity and you have to do this activity on all the nodes in your hadoop cluster.

We connected computers in the past, Now computers are connecting us…!!

A new revolution is going to happen in the world with connected devices and human life. We can call it as Internet of Things. Now lot of discussions and experiments are happening around the world to make this concept a reality. This is a very simple concept from the point of view of a person who knows electronics. But the problem is that for making this internet of things a reality and to make it usable by every individuals, it needs much more effort. For developing an application that makes our life easier by connecting things, we need the skillsets of a software engineer and electronics engineer. An electronics engineer can design the circuit, wire up and get the signals in the proper way. But those signals are just some electrical signals which will be useful only to the technical people. These systems were already in the past. Then what is different in IoT..??

IoT includes connecting systems, sensing each and every members in the network and controlling of every single node in the network based on the feedback. The control may not be based on a pre-written script, it can be dynamic. This requires extensive analytic knowledge along with electronics knowledge. If we sense and control every member of a large network, the data size will be high. Here we have big data solutions. So if we combine the electronics and analytic knowledge, we can build superb systems that can revolutionize the world. If we make everything connected and everything under control, there are some pros and cons. This will be just like the atomic energy. Good people used it for energy sustenance by generating energy and bad people used it as atom bomb for destroying people. Hope IoT will be used in the right way..!!!!