Advertisements

How to maintain packages in a python project ?

In most of the cases, we might need external packages for the development of a python program. These external packages are either available in pypi repository or available locally as archive files. Usually people just installs the packages directly in the python environment using pip command.

The pip command by default installs the package from the pypi repository. If we are not specifying the version, it selects the latest available version of that package supported by the python present in the environment (Python 2 or 3). Because of this nature, the pip command will always pick up the latest version of the packages. The packages may undergo drastic changes in newer releases. For example, an application developed with version X of a package may not work with the version Y of the same package. So simply noting down the package names itself will not help to manage the project. We need the list of all packages with the versions. Also manually installing the packages one by one is also a difficult task, because there can be several tens of packages within a single project.

The best practices for managing packages in a project are

  1. Use python Virtual Environment.
  2. Create a requirements.txt to maintain the package details.

The details on how to create and use virtual environment is explained in my previous post.

requirements.txt is a simple text file to maintain all the package dependencies with versions. A sample format is given below

Click==7.0
Flask==1.0.2
Flask-Cors==3.0.7
itsdangerous==1.1.0
Jinja2==2.10
MarkupSafe==1.1.0
six==1.12.0
Werkzeug==0.14.1

Packages can be installed using a single command

pip install -r requirements.txt

Packages in an environment can be captured in a requirements.txt file in one shot using the following command.

pip freeze > requirements.txt

This practice will help developers to manage the dependency list and easy code migration.

Advertisements

Virtual environment in Python

What is a Virtual Environment ?

A virtual environment is a tool that helps developers to segregate and maintain the dependencies required by different projects by creating isolated python virtual environments.

Need for Virtual Environment ?

Suppose,  User A and User B are working on 2 projects.

The package requirements for user A is given below

Click==7.0
Flask==1.0.2
Flask-Cors==3.0.7
itsdangerous==1.1.0
Jinja2==2.10
MarkupSafe==1.1.0
six==1.12.0
Werkzeug==0.14.1

Also the package requirements for user B is given below.

Click==6.0
Flask==1.0.1
Flask-Cors==3.0.2
itsdangerous==1.1.0
Jinja2==2.10
MarkupSafe==1.1.0
six==1.11.0
Werkzeug==0.11.2

As you can see, the two developers use different versions of similar packages. Python does not have the ability to differentiate between multiple versions of the same package in the site-packages directory.  By default the package installation happens in the default site-packages directory of the python installation. In Unix like operating system, by default the location will be owned by the root user and a normal user will not be able to perform a package installation without elevated privileges.

Virtual environment plays its role in the following scenarios

  • To isolate the dependencies required for different projects
  • To maintain the base python packages untouched. In case of multi-user environment, upgrading or modifying a package might disrupt the operation of one or more projects.
  • To enable easy access for the installation and management of python packages to the end users without enabling system level elevated privileges.
  • To easily manage the dependencies used in a specific project. We can copy the virtual environment to another system of the same version. Also it is easy to replicate the environment by dumping the package list (pip freeze).

How to create a virtual environment ?

For creating a virtual environment, we need the virtual environment package installed in the base python environment.

As root user or with elevated privilege, execute the following command

pip install virtualenv

Then create the virtual environment with the following command. The virtualenv command will be available only if the package was installed in the base python.

virtualenv <path for virtual environment>

You can specify any writable location as the path for virtual environment. All the virtual environment related files and packages will be installed in this directory. It may take few seconds complete the virtual environment setup.

Now type

which python

this will be still pointing to the base python. For using the virtual environment, we need to activate the environment.

source <path of virtual environment>/bin/activate

The above command will activate the virtual environment in the current session. For making it enabled in all sessions by default, add these lines in the .bashrc file.

Now again type which python check the result. It will be pointing to the newly created virtual environment.

For deactivating the environment, simply type deactivate in the command line.

 

How to develop a background function in Python ?

This is an example of executing a function in the background. I was searching for an option to run a function in background along with the normal execution flow.


The main execution will continue in the same flow without waiting for the background function to complete and the function set for background execution will continue its execution in the background.


You can modify this code based on your requirement. Just replace the logic inside function under the @background annotation. Hope this tip helps 🙂


How to containerize a python flask application ?

Containerization is one of the fast growing and powerful technologies in software Industry. With this technology, user can build, ship and deploy the applications (standalone and distributed) seamlessly. Here are the simple steps to containerize a python flask application.

Step 1:
Develop your flask application. Here for demonstration I am using a very simple flask application. You can use yours and proceed with the remaining steps. If you are new to this technology, I would recommend you to start with this simple program. As usual with all the tutorials, here also I am using a “Hello World” program. Since we are discussing about Docker, we can call it as “Hello Docker”. I will demonstrate the containerization of an advanced application in my next post.

import json
from flask import Flask

app = Flask(__name__)

@app.route("/requestme", methods = ["GET"])
def hello():
    response = {"message":"Hello Docker.!!"}
    return json.dumps(response)


if __name__ == '__main__':
    app.run(host="0.0.0.0", port=9090, debug=True)

Step 2:
Ensure the project is properly packaged and the dependencies are mentioned in the requirements.txt. A properly packaged project is easy to manage. All the dependent packages are required in the code execution environment. The dependencies will be installed based on the requirements.txt. So prepare the dependency list properly and add it in the requirements.txt file. Since our program is a simple one module application, there is nothing to package much. Here I am keeping the python file and the requirements.txt in a folder named myproject (Not using any package structure)

 

Step 3:
Create the Dockerfile. The file should be with the name “Dockerfile“. Here I have used python 2 base image. If you use python:3, then python 3 will be the base image. So based on your requirement, you can select the base image.

FROM python:2
ADD myproject /
WORKDIR /myproject
RUN pip install -r requirements.txt
CMD [ "python", ".myflaskapp.py" ]

Ensure you create the Dockerfile without any extension. Docker may not recognize the file with .txt extension.

Step 4:
Build an image using the Dockerfile. Ensure we keep the python project and the Dockerfile in proper locations.
Run the following command from the location where the Dockerfile is kept. The syntax of the command is given below

docker build -t [imagename]:[tag] [location]

The framed command is given below. Here I am executing the build command from the same location as that of the Dockerfile and the project, so I am using ‘dot’ as the location. If the Docker file is located in a different location, you can specify it using the option -f or using –file.

docker build -t myflaskapp:latest .

Step 5:
Run a container from the image

docker run -d -p 9090:9090 --name myfirstapp myflask:latest

Step 6:
Verify the application
List the running containers

docker ps | grep myfirstapp

Now your application is containerized.

pythonContainer_docker

Step 7:
Save the docker image locally. The following command will save the docker image as a tar file. You can take this file to any other environment and use it.

docker save myflaskapp > myflaskapp.tar

Save the docker image to Dockerhub also.

In this way you can ship and run your application anywhere.

Common dependencies to install PyCrypto package in CentOS/RHEL

The installation of pycrypto package may fail with errors like

“error: no acceptable C compiler found in $PATH”

“RuntimeError: autoconf error”

“fatal error: Python.h: No such file or directory”

” #include “Python.h”
^
compilation terminated.
error: command ‘gcc’ failed with exit status 1″

The solution for this issue is to install the following dependent packages.

yum install gcc

yum install gcc-c++

yum install python-devel

pip install pycrypto

Python code to list all the running EC2 instances across all regions in an AWS account

This code snippet will help you to get the list of all running EC2 instances across all regions in an AWS account. I have used python boto3 package for developing the code. This code will dynamically pick up all the aws ec2 regions. So the code will work perfectly without any modification even if a new region gets added to the AWS.

Note: Only the basic api calls just to list the instance details are mentioned in this program . Proper coding convention is not followed . 🙂

Changing the python version in pyspark

pyspark will pick one version of python from the multiple versions of python installed in the machine. In my case, I have python 3, 2.7 and 2.6 installed in my machine and pyspark was picking python 3 by default. If we have to change the python version used by pyspark, set the following environment variable and run pyspark.

export PYSPARK_PYTHON=python2.6

similarly we can configure any version of python with pyspark. Ensure that python2.6 or whatever you are specifying is available

Programmatic way to reboot EC2 instances

Sometimes we might have to reboot EC2 instances. If the requirement is to restart EC2 instances regularly, we can achieve it by writing a small piece of code. I also came across a similar requirement and a portion of the code I used is given below.

 

How to hide or obfuscate python source code ?

Sometimes we may have the requirement to provide applications without source code. In Java it is very easy and people are widely using also. If we want to hide our source code in python what we will do ??

I checked for several solutions for obfuscating the source code . One is using pyminifier. This is  a good tool. This will rename the methods and variables. So that the obfuscated code will look more complicated. But still if you spend some time, we can read it.

Another best way to hide the source code completely is by using the built-in compiler in the python itself. This will generate a byte code and we can use that for execution.

python -OO -m py_compile  <your code.py>

This will generate a .pyo file. Rename the .pyo file to .py extension. You can use this for execution. This will work just like the actual code.

NB : If your program imports modules obfuscated like this, then you have to rename them with a .pyc suffix instead

Programmatic Data Upload to Amazon S3

S3 is a service provided by Amazon for storing data. The full form is Simple Storage Service. S3 is a very useful service for less price. Data can be uploaded to and downloaded from S3 very easily using some tools as well as program. Here I am explaining  a sample program for uploading file to S3 using a python program.

Files can be uploaded to S3 in two approaches. One is the normal upload and another is the multipart upload. Normal upload sends the file serially and is not suitable for large files. It will take more time. For large files, multipart upload is the best option. It will upload the file by dividing it into chunks and sends it in parallel and collects it in S3.

This program is using the normal approach for sending the files to S3. Here I used the boto library for uploading the files.