Green House Farming

What is Green House Farming ?

Green House Farming is a technique in which the crops are cultivated in a controlled enclosed environment. This is basically to control the effects due to whether changes.

Advantages of Green House

  • Plants will get an environment with consistent temperature and Humidity
  • Plants will be protected from birds and other organisms
  • The moisture content in the soil will not evaporate easily
  • Easy to control pests
  • Easy to maintain the fields
  • The environment will not get affected because of the external weather

The picture of one of the greenhouses that I visited in the recent past is shared below.

greenhouse

Rose plants are cultivated in this Green House. The plants are planted in well arranged lines and drip irrigation is established across the plantation.

The newly formed rose buds are wrapped with a net protector to maintain proper shapes and protect the buds from other damages. These nets will ensure controlled development during budding. A high quality rosebud will be large in size (long bud with a well formed, heavy base). If you observe closely, you can see these nets in the buds present in the above picture. A sample image of the rose bud net is posted below.

rosebud_net

dependency xml is not available

The error “dependency xml is not available” can be resolved by installing the following packages.

For CentOS/RHEL

yum install libxml2 libxml2-devel

For Ubuntu

apt-get install libxml2-dev

Delta Science – The art of designing new generation Data Lake

When we hear about Delta Lake, the first question that comes to our mind is

“What is Delta Lake and How it works ?”. 

“Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark and big data workloads”

But the question is how it is possible to maintain transactions in the Big Data world ?. The answer is very simple. It is using Delta Format.

Delta Lake stores data in Delta Format. Delta format is a versioned parquet format along with a scalable metadata. It stores the data as parquet internally and it tracks the changes happening to the data in the metadata file. So the metadata will also grow along with the data.

Delta format solved several major challenges in the Big Data Lake world.  Some of them are listed below

  1. Transaction management
  2. Versioning
  3. Incremental Load
  4. Indexing
  5. UPSERT and DELETE operations
  6. Schema Enforcement and Schema Evolution

I will elaborate this post by explaining each of the above features and explain more about the internals of Delta Lake.

How to configure Delta Lake on EMR ?

EMR versions 5.24.x and higher versions has Apache Spark version 2.4.2 and higher. So Delta Lake can be enabled in EMR versions 5.24.x and above. By default Delta Lake is not enabled in EMR. It is easy to enable Delta Lake in EMR.

We just need to add the delta jar to the spark jars. We can either add it manually or can be performed easily by using a custom bootstrap script. A Sample script is given below. Upload the delta-core jar to an S3 bucket and download it to the spark jars folder using the below shell script. The delta core jar can be downloaded from maven repository. You can even build it yourselves also. The source code is available in github.

Adding this as a bootstrap action will automatically perform this activity while provisioning the cluster. Keep the below script in an S3 location and pass it as bootstrap script.

copydeltajar.sh

#!/bin/bash

aws s3 cp s3://mybucket/delta/delta-core_2.11.0.4.0.jar /usr/lib/spark/jars/

You can launch the cluster either by using the aws web console or by using the aws cli.

aws emr create-cluster --name "Test cluster" --release-label emr-5.25.0 \
--use-default-roles --ec2-attributes KeyName=myDeltaKey \
--applications Name=Hive Name=Spark \
--instance-count 3 --instance-type m5.xlarge \
--bootstrap-actions Path="s3://mybucket/bootstrap/copydeltajar.sh"

 

How to clear/delete the cached Kerberos ticket ?

In Linux

kdestroy

 

In Windows

klist purge

Disable auto restart policy of docker container

If a docker container is started with –restart=always, then the container will not allow you to stop it.  We can change this behavior by modifying the restart policy. Refer the docker  official documentation for more info

For example

docker run -d --restart=always -p 80:80 -it nginx

To modify this behavior, try the following command.

docker update --restart=no your-container

Another option that allows us to stop the container manually is

docker update --restart=unless-stopped your-container