CDH cluster installation failing in “distributing” stage- Failure due to stall on seeded torrent

I faced this issue while distributing the downloaded packages in cloudera manager.

The solution that worked for me is to add the IP Address – Hostname mapping in all the /etc/hosts files of all the cloudera manager server and agents

/etc/hosts

192.168.0.101   cdhdatanode1

ERROR Failed to collect NTP metrics – Cloudera Manager Agent

If you are facing an error like “Failed to collect NTP metrics”. The following solution might help you. This is because of the lack of ntp server in the server. The below solution will work for CentOS/RHEL systems. NTP will sync the system time with the network time.

yum install ntp

systemctl enable ntpd

systemctl restart ntpd

RJDBC java.lang.OutOfMemoryError

You might see the below error while making jdbc connections from R programs.

java.lang.OutOfMemoryError: Java heap space

If you face java heap size exceptions in RJDBC connections like above, simply increase the JAVA heap size from your R program. Sample snippet is given below.

options(java.parameters = "-Xmx8048m")
library("RJDBC")

or

options(java.parameters = "-Xmx8g")
library("RJDBC")

Hope this helps you.

How to extract a tar.gz file quickly in Linux

Recently I got a tar.gz file of around 30 GB and on extraction it will become approximately 4TB. I want to speed up the extraction as the normal extraction was taking approximately a day. I searched a lot and finally figured out a solution.

The solution was pigz. This is an advanced version of gzip. It uses multiple threads for reading, writing and checksum calculations. The extraction happens in a single thread. But overall performance is far better than the normal gzip.

The command to install pigz in CentOS or RHEL is given below. Ensure epel repository is enabled in your system

yum install pigz

The command to extract a tar.gz file using pigz is given below.

pigz -dc compressed.tar.gz | tar xf -

If you want to see the progress of the extraction process, you need to use Pipe Viewer (pv). PV (“Pipe Viewer”) is a tool for monitoring the progress of data through a pipeline. It can be inserted into any normal pipeline between two processes to give a visual indication of how quickly data is passing through, how long it has taken, how near to completion it is, and an estimate of how long it will be until completion.

Pipe viewer can be installed in CentOS or RHEL using the following command

yum install pv

Using pv, we can monitor the progress of the decompression process

pigz -dc compressed.tar.gz | pv | tar xf -

 

How to find and kill a process locking a particular port in Linux?

Sometimes because of some issue or bug, our application may stop working, but the port will be locked. This kind of issue is very common with MySQL server, Elasticsearch, WebServices, Tomcat etc. In such scenarios, we have to find the zombie process and kill it to free up the locked port.

How to find the process that locks the port?

Use the following command

netstat -tulpn | grep <port>

This output of this command will contain the process id. Now we just need to kill the process.

Verify the process

Before killing the process, figure out what process it is and ensure we are not killing any required processes.

ps -aux | grep <process id>

The output of the above command will give the details of the process.

How to Kill a process ?

After confirming the details, you can kill the process

kill -9 <process id>

Now verify whether the port is still locked or not by executing the netstat command again

How to auto connect OpenVPN during windows boot up?

Generally we establish VPN connection using OpenVPN using the connect option present in GUI application. Sometimes we may came across situations in which we need to enable vpn auto connect on the system boot.

I got a similar requirement. I have a desktop server which is located remotely and I want to access it from my laptop. The desktop will be accessible only through my vpn. So if someone turns off the desktop, during reboot, the vpn needs to be autoconnected so that I can access it from my network without any assistance from others. Here is the steps that I followed to achieve this. I created a task in the windows task scheduler. My operating system was Windows 10 (The same steps will work in all the recent versions of windows)

Step 1: Open Task Scheduler

Search for Task Scheduler and Open the Task Scheduler

openvpn_OpenTaskScheduler

 

Step 2: Click on Create Task

Once you open the Task Scheduler, you can see several options. Select Create Task option to create a new task.

openvpn_createtask

Step 3: Configure the Task details

Start create the task by filling the following details in the General section.

openvpn_taskdetails

Step 4: Add new Trigger to the Task

Trigger is basically the parameter that tells the system when to trigger this action. We need to create a new Trigger for this task. Click on New and create a trigger as explained in the next step.

openvpn_createtrigger

Step 5: Configure the Trigger

We will configure the trigger details in this section. Choose Begin the task: At Start up. This means the task will be triggered during the startup of the system. Further tweaking can be made by setting the parameters in the advanced settings section.

openvpn_trigger

Step 6: Create new Action

This is the main section. This is the action that gets triggered by the task. Here we need to select action as “Start a program”. 

openvpn_create_action

Step 7: Configure Action

Our program is the openvpn client. Browse to the openvpn client installation and select the openvpn-gui.exe. The main part is the arguments section. We need to specify the config file in which we need to connect. Here my config file name is amal.ovpn and it is located in the config directory of openvpn installation. If we miss this argument, the openvpn auto connect will not work. To test this command, the simple thing that we can do is by directly executing the command in the command line (Powershell is recommended).

Eg: Go to the bin directory of OpenVPN (C:\Program Files\OpenVPN\bin) and open powershell there.

Execute the following command. The “amal.ovpn” can be replaced with your vpn config file name.

openvpn_powershell_testing If the above command is working fine, complete the action configuration and save the details.

openvpn_action_info

Note: The amal.ovpn is the vpn configuration file and is located in the OpenVPN config directory which defaults to “C:\Program Files\OpenVPN\config”

After configuring this, click on ok and save the task. Then test this task by rebooting the system. I have configured this set up several times in several places and it worked perfectly.

Hope this article helped you 🙂 . If you are facing any issues, please comment on this post, I will be happy to help you.

 

 

How to maintain packages in a python project ?

In most of the cases, we might need external packages for the development of a python program. These external packages are either available in pypi repository or available locally as archive files. Usually people just installs the packages directly in the python environment using pip command.

The pip command by default installs the package from the pypi repository. If we are not specifying the version, it selects the latest available version of that package supported by the python present in the environment (Python 2 or 3). Because of this nature, the pip command will always pick up the latest version of the packages. The packages may undergo drastic changes in newer releases. For example, an application developed with version X of a package may not work with the version Y of the same package. So simply noting down the package names itself will not help to manage the project. We need the list of all packages with the versions. Also manually installing the packages one by one is also a difficult task, because there can be several tens of packages within a single project.

The best practices for managing packages in a project are

  1. Use python Virtual Environment.
  2. Create a requirements.txt to maintain the package details.

The details on how to create and use virtual environment is explained in my previous post.

requirements.txt is a simple text file to maintain all the package dependencies with versions. A sample format is given below

Click==7.0
Flask==1.0.2
Flask-Cors==3.0.7
itsdangerous==1.1.0
Jinja2==2.10
MarkupSafe==1.1.0
six==1.12.0
Werkzeug==0.14.1

Packages can be installed using a single command

pip install -r requirements.txt

Packages in an environment can be captured in a requirements.txt file in one shot using the following command.

pip freeze > requirements.txt

This practice will help developers to manage the dependency list and easy code migration.

How to check whether a Raspberry Pi is 32 bit or 64 bit ?

The latest version of Raspberry Pi comes with 64 bit CPU, but prior to that it was with 32 bit CPU. Some softwares and applications are dependent on CPU and OS architecture.

There are various options to check the architecture.

Method 1:

type the following command and check the response

uname -m

You will get a response something like armv7l or armv8.

ARMv7 and below are 32-bit. AMRv8 introduces the 64-bit instruction set.

Method 2:

Install lshw using the command

apt-get install lshw

Then type the command lshw.  You will be able to find the architecture from the response of the command.

How to enable docker-compose to always rebuild containers from fresh images?

docker-compose by default may pull images from the cache. If you don’t want this to happen and want to rebuild all the containers from the scratch, the following command will help you.

docker-compose up --force-recreate

 

How to clear/delete the cached Kerberos ticket ?

In Linux

kdestroy

 

In Windows

klist purge