Bubble chart using Python

Bubble chart is one of the powerful and useful chart for representing data with three or four dimensions.

The position of the bubble is determined by the x & y axis values. These are the first two properties.

The size of the bubble can be controlled by the third property.

The colour of the bubble can be controlled by the fourth property.

A Sample program to create a bubble chart using the python library matplotlib is given below.

import matplotlib.pyplot as plot
import numpy as npy

# create some dummy data using numpy random function.
# Bubble charts are used to represent data with three or four dimensions.
# X axis can represent one property, Y can represent another property,
# The bubble size can represent another properly, the color of the bubble can represent another property.

x = npy.random.rand(50)
y = npy.random.rand(50)
z = npy.random.rand(50)
colors = npy.random.rand(50)
# use the scatter function
plot.scatter(x, y, s=z * 1000, c=colors)
plot.show()

Here we are generating some random data using numpy and plotting the bubble chart using matplotlib.

A sample output is given below.

bubblechart

Bubble Chart using Python

 

Switch Case Statements in Python

Switch case statements are very popular conditional control statements in almost all programming languages. But surprisingly this is not available in python.

Question: Is there any switch case statements in python ?

Answer: The direct answer is NO

Alternative options for switch case statements in python

Option 1: Using If – elif – else statements. An example is given below.

if case == "case1":
    execute_func_case1()
elif case == "case2":
    execute_func_case2()
elif case == "case3":
    execute_func_case3()
else:
    execute_default_func()

Wow. Excellent.  The above code looks good right ?. It works exactly like switch-case statements, then why need switch-case statements in Python ?

Have you noticed a problem ?. The above if-elif-else conditions are fine as long as we have less number of cases. Imagine the situation with 10 or more elif conditions. Now you got the problem right ?.

Lets try the second option

Option 2: Using List in Python as an alternative to switch case statements

An example is given below.

def add(a, b):
    return a + b

def sub(a, b):
    return a-b

case_funcs = [add, sub]

case_funcs[0](1, 2)
case_funcs[1](1, 2)

 

In the above program, we don’t have to use if-elif-else blocks, instead, we can call using the position or index of the list and call the function. This looks better than the previous option right ?. But what about the default case ?. Also what if someone types an option greater than the size of the list ?. It will throw exception and there is no way to handle default case.

Option 3: Using Dictionary as alternative to switch case statements in python

An example is given below.

def add(a, b):
    return a + b

def sub(a, b):
    return a-b

case_funcs = {'sum':add, 'subtract':sub}

case_funcs['sum'](1,2)

 

Here the implementation is much similar to the switch case statement. We use a key to identify or route to the required case or function. The keys can be anything and are not limited by the indices or positions.

Now lets talk about the drawbacks of the above implementation. The above method will throw KeyError if we pass an unknown key. Also there is no default case statement. How will we handle these problems?

Check the below program

def add(a, b):
    return a + b

def sub(a, b):
    return a-b

def default(a, b):
    return "Default Return"

case_funcs = {'sum':add, 'subtract':sub}

# sum is the key for the add(). 
# default is the default function that gets called for non existent keys
# (1, 2) are the arguments for the function
print(case_funcs.get('sum',default)(1,2))

 

Python dictionary has a get() method that returns the value based on the key. This has one more feature. We can configure a default value for non-existent keys. Wow now we got the solution.

So by using this feature, we can implement the switch-case like feature in python.

Stream Processing Framework in Python – Faust

I was looking for a highly scalable streaming framework in python. I was using spark streaming till now for reading data from streams with heavy through puts. But somehow I felt spark a little heavy as the minimum system requirement is high.

Last day I was researching on this and found one framework called Faust. I started exploring the framework and my initial impression is very good.

This framework is capable of running in distributed way. So we can run the same program in multiple machines. This will enhance the performance.

I tried executing the sample program present in their website and it worked properly. The same program is pasted below. I have used CDH Kafka 4.1.0. The program worked seamlessly.

To execute the program, I have used the following command.

python sample_faust.py worker -l info

The above program reads the data from Kafka and prints the message. This framework is not just about reading messages in parallel from streaming sources. This has integrations with an embedded key-value data store RockDB. This is opensourced by Facebook and is written in C++.

Merge two dictionaries in Python

This is the simplest way to merge or combine two dictionaries in python. This operation in supported in python version above 3.5.

 

Sample Output

{'p': 2, 'q': 4, 'r': 6, 's':8}

How to convert a csv file to json file ?

Sometimes we may get dataset in csv format and need to be converted to json format.  We can achieve this conversion by multiple approaches. One of the approaches is detailed below. The following program helps you to convert csv file into multiline json file.  Based on your requirement, you can modify the field names and reuse this program.

The sample input is give below.

1001,Amal,Jose,100000
1002,Edward,Joe,100001
1003,Sabitha,Sunny,210000
1004,John,P,50000
1005,Mohammad,S,75000

 

Output multiline json is given below.

{"EmpID": "1001", "FirstName": "Amal", "LastName": "Jose", "Salary": "100000"}
{"EmpID": "1002", "FirstName": "Edward", "LastName": "Joe", "Salary": "100001"}
{"EmpID": "1003", "FirstName": "Sabitha", "LastName": "Sunny", "Salary": "210000"}
{"EmpID": "1004", "FirstName": "John", "LastName": "P", "Salary": "50000"}
{"EmpID": "1005", "FirstName": "Mohammad", "LastName": "S", "Salary": "75000"}

 

 

Visualization using Python

Python is a powerful programming language. It can be used for developing almost all type of applications. I have used python for developing IoT applications, Data Science related applications, Statistical applications, Web Services, Automation, Networking, Web Applications, Big Data processing, visualization etc.

In this blog post, I will be introducing some of the  powerful visualization libraries available in python.

  • Pandas Visualization – The core of this library is matplotlib.
  • Matplotlib – This is one of the most popular visualization libraries in python.
  • ggplot – Based on R’s ggplot2
  • Seaborn – A data visualization library based on matplotlib. It provides a high-level interface for drawing statistical graphics.
  • Plotly – An open-source, interactive graphing library for Python

 

What is Pandas ?

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language. Pandas comes with two primary data structures

  • Series – (One dimensional)
  • DataFrame – (Two dimensional)

These two structures helps us to handle majority of the usecases. Those who are handy with R programming language can easily implement their logic in a much powerful and better way using python pandas. Users get almost all the functionalities present in the R’s dataframe. Pandas is built on top of the popular Numpy package.

Pandas has very good timeseries data handling and processing capability. We can avoid unnecessary loops and logic by implementing pandas. It is capable of doing

  • Frequency conversion (Eg: creating 5 minute data using a dataset with 1 second frequency),
  • Data range generation
  • Moving window statistics
  • date shifting etc.

Since there are so many documents related to the pandas, I am not going to explain pandas in detail. I will be explaining some usecases with pandas implementation in my further blog posts. I will be using pandas and other scientific libraries extensively in my upcoming blog posts.

 

Basic statistics using Python

Python comes with a built-in statistics module. This will help us to perform the statistical calculations very easily.

The following are the commonly used statistical functions.

Arithmetic Mean

Arithmetic mean is the average of a group of values. The mathematical equation is

Mean = Sum of group of values / Total number of values in the group

Mean vs Average: What’s the Difference?

Answer: Both are same. No difference

Suppose we have a list of values as shown below.

values = [1,2,3,4,5,6,7,8]

For calculating the mean, without using any built-in function, we have to use the following snippet of the code

values = [1,2,3,4,5,6,7,8]
sum = 0
for value in values:
    sum += value

mean = sum/len(values)
print("Sum -->:", sum)
print("Total Count-->:", len(values))
print("Arithmetic Mean-->:", mean)

The above program involves multiple steps. Instead of writing the entire logic, we can easily calculate the mean using the following code snippet

import statistics
values = [1,2,3,4,5,6,7,8]
print("Arithmetic Mean--> ", statistics.mean(values))

Arithmetic Mode

Arithmetic mode refers to the most frequently occurred value in a data set. Mode can be calculated very easily using the statistics.mode() function

import statistics
values = [1,2,2,2,2,2,2,1,2,3,4,5,2,3,4,5,6,66,6,6,6,6]
print(statistics.mode(values))

Arithmetic Median

Median is basically the mid value in the numerical data set. The median is calculated by ordering the numerical data set from lowest to highest and finding the number in the exact middle. If the count of total numbers in the group is an odd number, the median will be the number which is in the exact middle of the ordered list. If the count of total numbers is an even number, then the median will be the mean of the numbers that reside in the middle of the ordered list.

This can be simply calculated by the statistics.median() function.

import statistics
values = [21,1,2,3,4,5,6,7,8,24,29,50]
print("Arithmetic Median--> ", statistics.median(values))

 

Functions as Objects in Python

Python is very powerful. It is easy to learn. Applications can be developed very quickly using python because of the simplicity.

Everything in python is an object. This includes functions also. Are you aware of the following features of functions in python. I was not aware during my initial few years.

  • Functions can be the elements inside various data structures like lists, dictionaries etc.

Few examples

Function as argument to another function

A Sample program in python to explain the implementation of using function as an argument of another function is given below.

Functions as elements within data structures like list or dict()

A simple implementation of passing list of functions as argument to another function is shared below.

I hope this will help someone. 🙂

 

How to remove duplicates from a list in python ?

The following solution is a quick method to remove duplicate elements from a list using python.

values = [3,4,1,2,2,4,4,4,4,4,6,2,1,3,2,4,5,1,4]
values = list(set(values))
print("Curated List--->", values)

The above snippet of code is self explanatory and hope this helps.