dependency xml is not available

The error “dependency xml is not available” can be resolved by installing the following packages.

For CentOS/RHEL

yum install libxml2 libxml2-devel

For Ubuntu

apt-get install libxml2-dev

How to check the performance of DNS in your network ?

I was checking for tools to benchmark the performance of DNS servers in my network. The reason behind this performance test was to identify the root cause of the internet slowness within my network. One of the good free tool that I found online is DNS Benchmark Tool.

This is a very light weight and portable tool. This is just 180KB and helps us to perform the DNS speeed test. With this tool I figured out one anonymous DNS server running in an individuals laptop also.

dns_server

I found this tool as a useful utility.

How to split a list into chunks using Python

To split a large list into smaller lists, you can use the following code snippet.

This can be easily performed by numpy.

import numpy
num_splits = 3
large_list = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26]
splitted_list = numpy.array_split(large_list,num_splits);
for split in splitted_list:
    print(list(split))

 

CDH cluster installation failing in “distributing” stage- Failure due to stall on seeded torrent

I faced this issue while distributing the downloaded packages in cloudera manager.

The solution that worked for me is to add the IP Address – Hostname mapping in all the /etc/hosts files of all the cloudera manager server and agents

/etc/hosts

192.168.0.101   cdhdatanode1

ERROR Failed to collect NTP metrics – Cloudera Manager Agent

If you are facing an error like “Failed to collect NTP metrics”. The following solution might help you. This is because of the lack of ntp server in the server. The below solution will work for CentOS/RHEL systems. NTP will sync the system time with the network time.

yum install ntp

systemctl enable ntpd

systemctl restart ntpd

RJDBC java.lang.OutOfMemoryError

You might see the below error while making jdbc connections from R programs.

java.lang.OutOfMemoryError: Java heap space

If you face java heap size exceptions in RJDBC connections like above, simply increase the JAVA heap size from your R program. Sample snippet is given below.

options(java.parameters = "-Xmx8048m")
library("RJDBC")

or

options(java.parameters = "-Xmx8g")
library("RJDBC")

Hope this helps you.

How to extract a tar.gz file quickly in Linux

Recently I got a tar.gz file of around 30 GB and on extraction it will become approximately 4TB. I want to speed up the extraction as the normal extraction was taking approximately a day. I searched a lot and finally figured out a solution.

The solution was pigz. This is an advanced version of gzip. It uses multiple threads for reading, writing and checksum calculations. The extraction happens in a single thread. But overall performance is far better than the normal gzip.

The command to install pigz in CentOS or RHEL is given below. Ensure epel repository is enabled in your system

yum install pigz

The command to extract a tar.gz file using pigz is given below.

pigz -dc compressed.tar.gz | tar xf -

If you want to see the progress of the extraction process, you need to use Pipe Viewer (pv). PV (“Pipe Viewer”) is a tool for monitoring the progress of data through a pipeline. It can be inserted into any normal pipeline between two processes to give a visual indication of how quickly data is passing through, how long it has taken, how near to completion it is, and an estimate of how long it will be until completion.

Pipe viewer can be installed in CentOS or RHEL using the following command

yum install pv

Using pv, we can monitor the progress of the decompression process

pigz -dc compressed.tar.gz | pv | tar xf -

 

How to find and kill a process locking a particular port in Linux?

Sometimes because of some issue or bug, our application may stop working, but the port will be locked. This kind of issue is very common with MySQL server, Elasticsearch, WebServices, Tomcat etc. In such scenarios, we have to find the zombie process and kill it to free up the locked port.

How to find the process that locks the port?

Use the following command

netstat -tulpn | grep <port>

This output of this command will contain the process id. Now we just need to kill the process.

Verify the process

Before killing the process, figure out what process it is and ensure we are not killing any required processes.

ps -aux | grep <process id>

The output of the above command will give the details of the process.

How to Kill a process ?

After confirming the details, you can kill the process

kill -9 <process id>

Now verify whether the port is still locked or not by executing the netstat command again