Hue Error – DatabaseError: database is locked

You may face this error in Hue while using Impala or Hive. This is because of the lock happening in the backend database used in Hue. Hue uses a backend database to store all the metadata and history. By default it uses sqlite, which is not suitable for multiuser environments. The usage of the sqlite causes this issue.

We can resolve this by using mysql, postgresql or oracle database as the metastore for hue.

The detailed steps are explained in this document

Advertisements

Undeleting and purging KeyTrustee Key Provider methods via the REST interface

HDFS Data encryption is an excellent feature that came recently. With this we can encrypt the data in hdfs. We can create multiple encryption zones with different encryption keys. In this way, we can secure the data in hdfs properly. For more details, you can visit these websites. Reference1, Reference2

I am using a cluster installed with CDH. I created some encryption keys and zones.
The command I used for creating a key is given below.

# As the normal user, create a new encryption key
hadoop key create amalKey
 

# As the super user, create a new empty directory and make it an encryption zone
hadoop fs -mkdir /user/amal
hdfs crypto -createZone -keyName amalKey -path /user/amal
 

# chown it to the normal user
hadoop fs -chown amal:hadoop /user/amal
 

# As the normal user, put a file in, read it out
hadoop fs -put test.txt /user/amal/
hadoop fs -cat /user/amal/test.tx
 

After some days, I deleted the encryption zone and I deleted the encryption key also.
The command I used for deleting the encryption key is given below

hadoop key delete <key-name>

After the deletion, I tried creating the key with the same name. But I got an exception that the key is still present in the disabled state. When I list the keys, I am not able to see the key. The exception that I got was given below.

amalKey has not been created. java.io.IOException: HTTP status [500], exception [com.cloudera.keytrustee.TrusteeKeyProvider$DuplicateKeyException], message [Key with name "amalKey" already exists in "com.cloudera.keytrustee.TrusteeKeyProvider@6d88562. Key exists but has been disabled. Use undelete to enable.]
java.io.IOException: HTTP status [500], exception [com.cloudera.keytrustee.TrusteeKeyProvider$DuplicateKeyException], message [Key with name "amalKey" already exists in "com.cloudera.keytrustee.TrusteeKeyProvider@6d88562. Key exists but has been disabled. Use undelete to enable.]
at org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:159)
at org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:545)
at org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:503)
at org.apache.hadoop.crypto.key.kms.KMSClientProvider.createKeyInternal(KMSClientProvider.java:676)
at org.apache.hadoop.crypto.key.kms.KMSClientProvider.createKey(KMSClientProvider.java:684)
at org.apache.hadoop.crypto.key.KeyShell$CreateCommand.execute(KeyShell.java:483)
at org.apache.hadoop.crypto.key.KeyShell.run(KeyShell.java:79)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.crypto.key.KeyShell.main(KeyShell.java:515)

In the error logs, it says to use purge option to permanently delete the key and undelete to recover the deleted key. But I was not able to find these options with hadoop key command. I googled it and I couldn’t figure out this issue. Finally I got the guidance from one guy from cloudera to execute the purge & undelete commands through rest api of keytrustee and he gave a nice explanation for my issue. I am briefly putting the solution for this exception below.

The delete operation on the Trustee key provider is a “soft delete”, meaning that is possible to “undelete” the key. It is also possible to “purge” the key to delete it permanently. Because these operations are not part of the standard Hadoop key provider API, they are not currently exposed through Hadoop KeyShell (hadoop key). However, you can call these operations directly via the Trustee key provider REST API.

See the examples below.

Use KeyShell to list existing keys

$ ./bin/hadoop key list -provider kms://http@localhost:16000/kms
 
Listing keys for KeyProvider: KMSClientProvider[http://localhost:16000/kms/v1/]
amal-testkey-1

Use KeyShell to delete an existing key

$ ./bin/hadoop key delete amal-testkey-1 -provider kms://http@localhost:16000/kms
 
Deleting key: ajy-testkey-1 from KeyProvider: KMSClientProvider[http://localhost:16000/kms/v1/]
amal-testkey-1 has been successfully deleted.
KMSClientProvider[http://localhost:16000/kms/v1/] has been updated.

Use KeyShell to verify the key was deleted

$ ./bin/hadoop key list -provider kms://http@localhost:16000/kms
 
Listing keys for KeyProvider: KMSClientProvider[http://localhost:16000/kms/v1/]
 

Use the KeyTrustee key provider REST API to undelete the deleted key

$ curl -L -d "trusteeOp=undelete" "http://localhost:16000/kms/v1/trustee/key/amal-testkey-1?user.name=amal&trusteeOp=undelete"

Use KeyShell to verify the key was restored

$ ./bin/hadoop key list -provider kms://http@localhost:16000/kms
 
Listing keys for KeyProvider: KMSClientProvider[http://localhost:16000/kms/v1/]
amal-testkey-1

Use the KeyTrustee key provider REST API to purge the restored key

$ curl L -d "trusteeOp=purge" "http://localhost:16000/kms/v1/trustee/key/amal-testkey-1?user.name=amal&trusteeOp=purge"

Use KeyShell to verify the key was deleted

$ ./bin/hadoop key list -provider kms://http@localhost:16000/kms
 
Listing keys for KeyProvider: KMSClientProvider[http://localhost:16000/kms/v1/]
 

Use the KeyTrustee key provider REST API to attempt to undelete the purged key

$ curl -L -d "trusteeOp=undelete" "http://localhost:16000/kms/v1/trustee/key/amal-testkey-1?user.name=amal&trusteeOp=undelete"

{
"RemoteException" : {
"message" : "Key with name amal-testkey-1 not found in com.cloudera.keytrustee.TrusteeKeyProvider@6d88562",
"exception" : "IOException",
"javaClassName" : "java.io.IOException"
}
}

Configure ACLs for KeyTrustee undelete, purge and migrate operations

ACLs for the KeyTrustee specific undelete, purge and migrate operations are configured in kts-acls.xml. Place this file in the same location as your kms-acls.xml file. See example below.

<property>
   <name>keytrustee.kms.acl.UNDELETE</name>
     <value>*</value>
       <description>
          ACL for undelete-key operations.
      </description>
</property>
 
<property>
  <name>keytrustee.kms.acl.PURGE</name>
    <value>*</value>
      <description>
         ACL for purge-key operations.
      </description>
</property>
 
<property>
  <name>keytrustee.kms.acl.MIGRATE</name>
    <value>*</value>
     <description> 
      ACL for purge-key operations.
     </description>
</property>
 

Note: In kerberized environments, the requests will be a little different. It will be in the following format

Eg :
curl -L --negotiate -u [username]  -d "trusteeOp=undelete" "http://localhost:16000/kms/v1/trustee/key/amal-testkey-1?user.name={username}&trusteeOp=undelete"

Installing Cloudera Manager in an existing hadoop cluster

Cloudera Manager is an Infrastructure management and monitoring tool provided by cloudera. This has now became a very excellent tool to manage bigdata infrastructure. The pain of administrators has been reduced by 80% with this cloudera manager. Almost everything required for an administrator is integrated into this great software and is very user friendly. Cloudera Manager became this muhc powerful recently. So lot of existing clusters are still running without using cloudera manager. If you want to manage an existing cluster using cloudera manager, the following steps may help you. For this you have to completely uninstall the existing hadoop set up. No data loss will happen because we are not touching any data. The configurations also will remain the same. These are just pointers.

1) Stop all the services
2) Back up hive metastore, Namenode metadata and all the other required metastores (Eg hue, oozie)
3) Back up all the configurations
4) Note down the existing storage directories
5) Uninstall all the hadoop services (Never touch the data)
6) Install Cloudera Manager Server and Agent
7) Install all the services (It should be same version as that of previous to make installation smoother)
8) Add the configurations (Use the same configurations as that of previous. There is an option to add xml configs in CM)
9) Point the storage directories in the cloudera manager configurations.
10) Point the new installation to the existing metastore (hive, oozie, hue etc)
11) Start all the services (Don’t format the namenode)
12) Test the cluster

Facebook Opensourced Presto

Facebook opensourced its data processing technique ‘Presto’ to the world. Presto is a distributed query engine based on ANSI SQL. It is very optimized and currently running with more than 300 petabytes of data, which may one among the top big data processing systems. Presto is a totally different from mapreduce. It is an in memory data processing mechanism and is very much optimised. From the details given in the facebook newsletter and presto website, it is 10 times faster than Hive. mapreduce.Hive came from facebook only, so presto will definitely beat hive. Hive queries are ultimately running as multiple mapreduce jobs and it will take more time. From my point of view, the competition may be between Cloudera Impala and Presto. Impala’s performance with huge datasets is not available now from any production environments because it is a budding technology from cloudera family, but presto is already tested and running in huge dataset production environment. Another interesting fact about presto is that we can use the already existing infrastructure and hadoop cluster for deploying presto, because presto supports hdfs as its underlying data storage. It supports other storage systems also. So it is flexible. Leading internet companies including Airbnb and Dropbox are using Presto. Presto code and further details are available in this link

I have deployed Presto and Impala on a small cluster of 8 nodes. I haven’t got enough time to explore more on presto. I am planning to explore more on the coming days. 🙂