Heavy mapreduce jobs may run for several hours. There can be several jobs and checking the status of mapreduce jobs manually will be a boring task. I don’t like this  J. If we try to manage java programs using a script, it will not be a clean approach. Using scripts for managing java programs is bad approach. I consider these kind of designs as worst designs.

My requirement was to get notification on completion of mapreduce jobs. These are some critical mapreduce jobs and I don’t want to frequently check the status and wait for its completion.

Hadoop is providing a useful configuration to solve my problem. It is very easy to achieve this solution. Just a few lines of code will help us. Add these three lines to the Driver class

conf.set("job.end.notification.url", "http://myserverhost:8888/notification/jobId=$jobId?status=$jobStatus");
conf.setInt("job.end.retry.attempts", 3);
conf.setInt("job.end.retry.interval", 1000);

By setting these properties, hadoop sends an http request on completion of the job. We need a small piece of code for creating a webservice that accepts this http request and send email. For creating the webservice and email utility I used python language because of simplicity. Once the mapreduce job completes, it sends an http request to the URL mentioned by the configuration job.end.notification.url. The variables jobId and jobStatus will be replaced with the actual values. Once a request comes to the webservice, it will parse the arguments and call the email sending module. This is a very simple example. Instead of email, we can make different kind of notifications such as sms, phone call or triggering some other application etc. The property job.end.notification.url  is very helpful in tracking asynchronous mapreduce jobs. We can trigger another action also using this trigger. This is a clean approach because we are not running any other script to track the status of the job. The job itself is providing the status. We are using the python program for just collecting the status and making notifications using the status.

The python code for the webservice and email notification are given below.


__author__ = 'Amal G Jose'
import smtplib
class SendEmail(object):
def __init__(self):
self.email_id = "sender@gmail.com"
self.email_password = "password"
def send_email(self, job_id, job_status):
FROM = self.email_id
TO = ['recepient@email.com']
SUBJECT = "Mapreduce Job Status of Job ID : %s" %(job_id)
TEXT = "Status of JobId : %s was : %s" %(job_id, job_status)
# Prepare actual message
message = """\From: %s\nTo: %s\nSubject: %s\n\n%s
""" % (FROM, ", ".join(TO), SUBJECT, TEXT)
try:
server = smtplib.SMTP("smtp.gmail.com", 587)
server.ehlo()
server.starttls()
server.login(self.email_id, self.email_password)
server.sendmail(FROM, TO, message)
server.close()
print 'Email Sent Successfully'
except:
print "Failed to send email"

view raw

SendEmail.py

hosted with ❤ by GitHub


__author__ = 'Amal G Jose'
import tornado.httpserver
import tornado.ioloop
import tornado.options
import tornado.web
from SendEmail import SendEmail
PORT = 8888
class JobStatus(tornado.web.RequestHandler):
def get(self):
job_id = self.get_argument("jobId", default="unknown", strip=True)
job_status = self.get_argument("status", default="unknown", strip=True)
notify = SendEmail()
notify.send_email(job_id, job_status)
self.write("Notified !!")
if __name__ == "__main__":
app = tornado.web.Application(handlers=[(r"/notification", JobStatus)])
http_server = tornado.httpserver.HTTPServer(app)
http_server.listen(PORT)
print "Starting server"
tornado.ioloop.IOLoop.instance().start()

view raw

SimpleRest.py

hosted with ❤ by GitHub

Advertisement