How to convert or change the data type of columns in Pandas dataframe ?

Changing the datatype of columns in pandas dataframe is very easy. Here I am using stype() function to perform the typecase operation.  Refer to the following example. The type conversion is happening in the line number 10 of the code.

 

You can add as many columns as you want to convert the data type or typecast. For example if you want to typecast the columns emp_id and salary, use the following syntax.

> df = df.astype({‘salary’:‘int’, ’emp_id’:’int’})

 

rpm: /usr/bin/rpmspec: No such file or directory – CentOS RHEL

I have faced an issue while building an rpm in a CentOS machine. The error was rpm: /usr/bin/rpmspec: No such file or directory. To fix this issue we need to install the following package and re-run the build.

For CentOS 7 or RHEL 7 users

> sudo yum install rpm-build

For CentOS 8 or RHEL 8 users

> sudo dnf install rpm-build

 

How to install Python in CentOS 8 / RHEL 8 ?

Python is not available by default in CentOS 8 / RHEL 8. Read the following blog post to install Python and pip in CentOS 8 / RHEL 8.

To Install Python 3 in CentOS 8 or RHEL 8

> sudo dnf install python3

CentOS 8 and RHEL 8 does not have an unversioned python by default. We have to explicitly set it. So simply typing python will give you a “command not found” response.  To verify the installation,  use the following command

> python3 -V

The above command will print the version information. For me it printed Python 3.6.8

To install pip, execute the following command

> sudo dnf install python3-pip

Check the installation

> pip3 –version

If you simply type the command python in the shell, it will give you a response something like below

bash: python: command not found…

To enable the command python, execute the following command.

> sudo alternatives –set python /usr/bin/python3

This will enable command python. Now you can use python without explicitly typing the version.

Note: Follow the below steps only if you need Python 2. If your requirement if Python3, refer the steps described above.

 

To Install Python 2 in CentOS 8 or RHEL 8

> sudo dnf install python2

To install pip, execute the following command

> sudo dnf install python2-pip

Now check the installation

> pip2 –version

To set python2 as the default python across the system, execute the following command.

> sudo alternatives –set python /usr/bin/python2

 

In previous versions of CentOS  and RHEL , there were so much dependency in the system with the unversioned python. Installing Python 3 and Python 2 together creates so much mess in the system. Now in CentOS 8 and RHEL 8, it is very easy.

Hope this blog helps. Please comment below if you face any issues. 🙂

 

 

Python program to find the timezone from latitude and longitude ( geo coordinates )

We all know that there are several timezones in the world. While developing applications that are used by the people across the world, we have to consider the users timezone. So depending upon their location, we have to display the parameters or values. I am sharing a simple python code snippet that finds the timezone based on the latitude and longitude.

This is a very simple program. There is a powerful package in python called timezoneinfo. We are using this package for finding the timezone information. This package works with python versions above 3.6. This is the optimal and quick way to find the timezone using geo coordinates.

The following command installs the package

pip install timezonefinder[numba]

Sample Program

 

This package works offline. That means you do not need to be connected to the internet to get this working. This covers the entire earth. In this way  we can find the timezone information with few lines of code. Hope this helps.

Bubble chart using Python

Bubble chart is one of the powerful and useful chart for representing data with three or four dimensions.

The position of the bubble is determined by the x & y axis values. These are the first two properties.

The size of the bubble can be controlled by the third property.

The colour of the bubble can be controlled by the fourth property.

A Sample program to create a bubble chart using the python library matplotlib is given below.

import matplotlib.pyplot as plot
import numpy as npy

# create some dummy data using numpy random function.
# Bubble charts are used to represent data with three or four dimensions.
# X axis can represent one property, Y can represent another property,
# The bubble size can represent another properly, the color of the bubble can represent another property.

x = npy.random.rand(50)
y = npy.random.rand(50)
z = npy.random.rand(50)
colors = npy.random.rand(50)
# use the scatter function
plot.scatter(x, y, s=z * 1000, c=colors)
plot.show()

Here we are generating some random data using numpy and plotting the bubble chart using matplotlib.

A sample output is given below.

bubblechart

Bubble Chart using Python

 

Production deployment of a Python Web Service (Flask / Tornado Application)

Python Flask and Tornado are two of the most popular frameworks in python for developing RESTful services.

Do you know how to develop and deploy a production grade python application. ?

A sample python flask service is given below. This is a sample flask web service. This has only one endpoint (/requestme) at is a GET method. (sample_flask.py). I am not focusing on the coding standards. My goal is to show you the production implementation of a python application.

We can run this program in the command line by executing the following command.

> python sample_flask.py

The service will be up and running in port 9090. You will be able to make requests to the application by using the URL http://ipaddress:9090/requestme.

How many requests will this python web service can handle ? 

10 or 20 or 100 ?? … Any guess ??

Definitely this is not going to handle too many requests. This is good for development trials and experimental purpose. But we cannot deploy something like this in production environment.

How to scale python applications  ?

Refer to the below diagram. The diagram has multiple instances of flask applications with Gunicorn WSGI proxied and load balanced through Nginx web server.

haproxy_python

Production Deployment of Python Flask Application

Sample Nginx configuration that implements the reverse proxy and load balancing is given below. 

This is a sample configuration and this does not have the advanced parameters.

server {
listen 80;
server_name myserverdomain

location / {
proxy_pass http://upstream_backend/requestme;
  }
}

upstream backend {
server gunicornapplication1:8080;
server gunicornapplication2:8080;

}

 

The upstream section routes the requests to the two gunicorn backends and the requests are routed in round robin manner. We can add as many backend servers as we need based on the load.

How to run the python applications with gunicorn ?

First lets install gunicorn

> pip install gunicorn

Now it is simple, run the following command.

> gunicorn -w 4 app:app

Now the our application will run with 4 workers. Each worker is a separate process and will be able to handle requests. The gunicorn will take care of handling the requests between each of the workers.

We can start multiple gunicorn instances like this and keep it behind the nginx. This is the way to scale our python applications.

Hope this helps 🙂 

“The Zen of Python”, by Tim Peters

Every Python Developer should try and read these statements periodically. In the python interpreter, type the following statement.

> import this

You will see the following response. Read it and refresh.

python_zen

ImportError: libSM.so.6: cannot open shared object file: No such file or directory

For CentOS users

yum install libXext libSM libXrender

For Ubuntu users

apt-get update && apt-get install -y libsm6 libxext6 libxrender1 libfontconfig1

 

Merge two dictionaries in Python

This is the simplest way to merge or combine two dictionaries in python. This operation in supported in python version above 3.5.

 

Sample Output

{'p': 2, 'q': 4, 'r': 6, 's':8}

Convert csv to json using pandas

The following sample program explains you on how to read a csv file and convert it into json data. Two programs are explained in this blog post. The first program expects the column names in the csv file and second program does not need column names in the file.

The first program expects the headers in the first line of the csv. In case of missing headers, we have to pass it explicitly in the program.

Sample Input

EMPID,FirstName,LastName,Salary
1001,Amal,Jose,100000
1002,Edward,Joe,100001
1003,Sabitha,Sunny,210000
1004,John,P,50000
1005,Mohammad,S,75000

Here the first line of the csv data is the header

Sample Output

[{"EMPID":1001,"FirstName":"Amal","LastName":"Jose","Salary":100000},{"EMPID":1002,"FirstName":"Edward","LastName":"Joe","Salary":100001},{"EMPID":1003,"FirstName":"Sabitha","LastName":"Sunny","Salary":210000},{"EMPID":1004,"FirstName":"John","LastName":"P","Salary":50000},{"EMPID":1005,"FirstName":"Mohammad","LastName":"S","Salary":75000}]

 

If the csv file contains a header row, then you should explicitly pass header=0 to override the column names. If headers are not present in the csv file, we have to explicitly pass the field names in a list to the argument names. Duplicates in this list are not allowed. A sample implementation is given below.