Facebook Opensourced Presto

Facebook opensourced its data processing technique ‘Presto’ to the world. Presto is a distributed query engine based on ANSI SQL. It is very optimized and currently running with more than 300 petabytes of data, which may one among the top big data processing systems. Presto is a totally different from mapreduce. It is an in memory data processing mechanism and is very much optimised. From the details given in the facebook newsletter and presto website, it is 10 times faster than Hive. mapreduce.Hive came from facebook only, so presto will definitely beat hive. Hive queries are ultimately running as multiple mapreduce jobs and it will take more time. From my point of view, the competition may be between Cloudera Impala and Presto. Impala’s performance with huge datasets is not available now from any production environments because it is a budding technology from cloudera family, but presto is already tested and running in huge dataset production environment. Another interesting fact about presto is that we can use the already existing infrastructure and hadoop cluster for deploying presto, because presto supports hdfs as its underlying data storage. It supports other storage systems also. So it is flexible. Leading internet companies including Airbnb and Dropbox are using Presto. Presto code and further details are available in this link

I have deployed Presto and Impala on a small cluster of 8 nodes. I haven’t got enough time to explore more on presto. I am planning to explore more on the coming days. 🙂

Advertisements

About amalgjose
I am an Electrical Engineer by qualification, now I am working as a Software Engineer. I am very much interested in Electrical, Electronics, Mechanical and now in Software fields. I like exploring things in these fields. I like travelling, long drives and very much addicted to music.

2 Responses to Facebook Opensourced Presto

  1. I’m really excited to see your results on that topic. We have used Hive in numerous projects and would like to take the next step to improve performance, either with Impala or Presto. Let’s see which one is faster and more convenient to use.

  2. Please post back the performance results when you are having some. Good on you Amal!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: