ImportError: libSM.so.6: cannot open shared object file: No such file or directory

For CentOS users

yum install libXext libSM libXrender

For Ubuntu users

apt-get update && apt-get install -y libsm6 libxext6 libxrender1 libfontconfig1

 

Stream Processing Framework in Python – Faust

I was looking for a highly scalable streaming framework in python. I was using spark streaming till now for reading data from streams with heavy through puts. But somehow I felt spark a little heavy as the minimum system requirement is high.

Last day I was researching on this and found one framework called Faust. I started exploring the framework and my initial impression is very good.

This framework is capable of running in distributed way. So we can run the same program in multiple machines. This will enhance the performance.

I tried executing the sample program present in their website and it worked properly. The same program is pasted below. I have used CDH Kafka 4.1.0. The program worked seamlessly.

To execute the program, I have used the following command.

python sample_faust.py worker -l info

The above program reads the data from Kafka and prints the message. This framework is not just about reading messages in parallel from streaming sources. This has integrations with an embedded key-value data store RockDB. This is opensourced by Facebook and is written in C++.

Merge two dictionaries in Python

This is the simplest way to merge or combine two dictionaries in python. This operation in supported in python version above 3.5.

 

Sample Output

{'p': 2, 'q': 4, 'r': 6, 's':8}

Convert csv to json using pandas

The following sample program explains you on how to read a csv file and convert it into json data. Two programs are explained in this blog post. The first program expects the column names in the csv file and second program does not need column names in the file.

The first program expects the headers in the first line of the csv. In case of missing headers, we have to pass it explicitly in the program.

Sample Input

EMPID,FirstName,LastName,Salary
1001,Amal,Jose,100000
1002,Edward,Joe,100001
1003,Sabitha,Sunny,210000
1004,John,P,50000
1005,Mohammad,S,75000

Here the first line of the csv data is the header

Sample Output

[{"EMPID":1001,"FirstName":"Amal","LastName":"Jose","Salary":100000},{"EMPID":1002,"FirstName":"Edward","LastName":"Joe","Salary":100001},{"EMPID":1003,"FirstName":"Sabitha","LastName":"Sunny","Salary":210000},{"EMPID":1004,"FirstName":"John","LastName":"P","Salary":50000},{"EMPID":1005,"FirstName":"Mohammad","LastName":"S","Salary":75000}]

 

If the csv file contains a header row, then you should explicitly pass header=0 to override the column names. If headers are not present in the csv file, we have to explicitly pass the field names in a list to the argument names. Duplicates in this list are not allowed. A sample implementation is given below.

 

How to convert a csv file to json file ?

Sometimes we may get dataset in csv format and need to be converted to json format.  We can achieve this conversion by multiple approaches. One of the approaches is detailed below. The following program helps you to convert csv file into multiline json file.  Based on your requirement, you can modify the field names and reuse this program.

The sample input is give below.

1001,Amal,Jose,100000
1002,Edward,Joe,100001
1003,Sabitha,Sunny,210000
1004,John,P,50000
1005,Mohammad,S,75000

 

Output multiline json is given below.

{"EmpID": "1001", "FirstName": "Amal", "LastName": "Jose", "Salary": "100000"}
{"EmpID": "1002", "FirstName": "Edward", "LastName": "Joe", "Salary": "100001"}
{"EmpID": "1003", "FirstName": "Sabitha", "LastName": "Sunny", "Salary": "210000"}
{"EmpID": "1004", "FirstName": "John", "LastName": "P", "Salary": "50000"}
{"EmpID": "1005", "FirstName": "Mohammad", "LastName": "S", "Salary": "75000"}

 

 

Visualization using Python

Python is a powerful programming language. It can be used for developing almost all type of applications. I have used python for developing IoT applications, Data Science related applications, Statistical applications, Web Services, Automation, Networking, Web Applications, Big Data processing, visualization etc.

In this blog post, I will be introducing some of the  powerful visualization libraries available in python.

  • Pandas Visualization – The core of this library is matplotlib.
  • Matplotlib – This is one of the most popular visualization libraries in python.
  • ggplot – Based on R’s ggplot2
  • Seaborn – A data visualization library based on matplotlib. It provides a high-level interface for drawing statistical graphics.
  • Plotly – An open-source, interactive graphing library for Python

 

What is Pandas ?

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language. Pandas comes with two primary data structures

  • Series – (One dimensional)
  • DataFrame – (Two dimensional)

These two structures helps us to handle majority of the usecases. Those who are handy with R programming language can easily implement their logic in a much powerful and better way using python pandas. Users get almost all the functionalities present in the R’s dataframe. Pandas is built on top of the popular Numpy package.

Pandas has very good timeseries data handling and processing capability. We can avoid unnecessary loops and logic by implementing pandas. It is capable of doing

  • Frequency conversion (Eg: creating 5 minute data using a dataset with 1 second frequency),
  • Data range generation
  • Moving window statistics
  • date shifting etc.

Since there are so many documents related to the pandas, I am not going to explain pandas in detail. I will be explaining some usecases with pandas implementation in my further blog posts. I will be using pandas and other scientific libraries extensively in my upcoming blog posts.

 

Functions as Objects in Python

Python is very powerful. It is easy to learn. Applications can be developed very quickly using python because of the simplicity.

Everything in python is an object. This includes functions also. Are you aware of the following features of functions in python. I was not aware during my initial few years.

  • Functions can be the elements inside various data structures like lists, dictionaries etc.

Few examples

Function as argument to another function

A Sample program in python to explain the implementation of using function as an argument of another function is given below.

Functions as elements within data structures like list or dict()

A simple implementation of passing list of functions as argument to another function is shared below.

I hope this will help someone. 🙂

 

What is a Stack ?. How to implement Stack in Python ?

What is a Stack ?

Stack is a structure in which items are stored and collected in LIFO order. LIFO means Last In First Out. We can see several stacks in our day to day life. A simple example of stack using paper is shown below. In this arrangement, the paper is stacked from bottom to top order and it will be taken back from top to bottom order.

stack

 

The insert and delete operations are often called push and pop. The schematic diagram of a STACK is given below. Here you can see how the items are pushed and taken out from the STACK.

 

stack01

In Python world, Stack can be implemented in the following methods.

  • list
  • queue.LifoQueue
  • collection.deque

 

Stack Implementation using LIST in Python

The native data structure list can be used as a stack. A simple list is given below.

[1,2,3,4,5,6,7,8]

The push operation can be performed by using the append() function in the list and the pop operation can be performed using pop() function. This usage of append() and pop() function will create a LIFO behavior and this can be used as a simple implementation of stack. The performance of the stack created using list will reduce with larger data. This is ideal for handling small amount of data.

The following program shows a simple implementation of stack using python list

 

Stack Implementation using LifoQueue (Queue) in Python

Stack can be implemented using the LifoQueue function in the Python Queue module. A simple implementation is given below. The program is self explanatory.

Stack Implementation using Deque in Python Collections module.

This approach is similar to that of the implementation using LIST. This will be more efficient than the implementation using the list. The sample program is given below. The program is self explanatory.

How to remove duplicates from a list in python ?

The following solution is a quick method to remove duplicate elements from a list using python.

values = [3,4,1,2,2,4,4,4,4,4,6,2,1,3,2,4,5,1,4]
values = list(set(values))
print("Curated List--->", values)

The above snippet of code is self explanatory and hope this helps.