Setting Up Multiple Users in Hadoop Clusters

Date: February 9, 2013Author: Amal G Jose 42 Comments

Need for multiple users

In hadoop we run different tasks and store data in HDFS.

If several users are doing tasks using the same user account, it will be difficult to trace the jobs and track the tasks/defects done by each user.

Also the other issue is with the security.

If all are given the same user account, all users will have the same privilege and all can access everyone’s data, can modify it, can perform execution, can delete it also.

This is a very serious issue.

For this we need to create multiple user accounts.

Benefits of Creating multiple users

1) The directories/files of other users cannot be modified by a user.

2) Other users cannot add new files to a user’s directory.

3) Other users cannot perform any tasks (mapreduce etc) on a user’s files.

In short data is safe and is accessible only to the assigned user and the superuser.

Steps for setting up multiple User accounts

For adding new user capable of performing hadoop operations, do the following steps.

Step 1

Creating a New User

For Ubuntu

sudo  adduser  --ingroup   <groupname>   <username>

For RedHat variants

useradd  -g <groupname>   <username>

passwd <username>

Then enter the user details and password.

Step 2

we need to change the permission of a directory in HDFS where hadoop stores its temporary data.

Open the core-site.xml file

Find the value of hadoop.tmp.dir.

In my core-site.xml, it is /app/hadoop/tmp. In the proceeding steps, I will be using /app/hadoop/tmp as my directory for storing hadoop data ( ie value of hadoop.tmp.dir).

Then from the superuser account do the following step.

hadoop fs –chmod -R  1777 /app/hadoop/tmp/mapred/staging

Step 3

The next step is to give write permission to our user group on hadoop.tmp.dir (here /app/hadoop/tmp. Open core-site.xml to get the path for hadoop.tmp.dir). This should be done only in the machine(node) where the new user is added.

chmod 777 /app/hadoop/tmp

Step 4

The next step is to create a directory structure in HDFS for the new user.

For that from the superuser, create a directory structure.

Eg: hadoop fs –mkdir /user/username/

Step 5

With this we will not be able to run mapreduce programs, because the ownership of the newly created directory structure is with superuser. So change the ownership of newly created directory in HDFS to the new user.

hadoop  fs –chown –R username:groupname   <directory to access in HDFS>

Eg: hadoop fs –chown –R username:groupname  /user/username/

Step 6

su  – username

Note: Run hadoop tasks in the assigned hdfs paths directory only ie /user/username.
Enjoy…. 🙂

42 thoughts on “Setting Up Multiple Users in Hadoop Clusters”

Add Comment

bongirr says:

August 31, 2013 at 8:46 pm

I followed all the steps until this below gives me an error:
hadoop fs -chown -R anand:hadoop /user/anand/

Error:
chown: changing ownership of ‘/user/anand’: Non-super user cannot change owner

Reply
amalgjose says:

September 1, 2013 at 7:19 am

Hi Bonjirr,
The error is because you are trying to change the ownership from a non-super user.
As which user you created the directory /user/anand.??
Try
hadoop fs -ls /
and find the owner of /user directory or find the superuser.
Then from the superuser, execute the command
hadoop fs -chown -R anand:hadoop /user/anand/
Eg: If the superuser is hadoop, then from root user execute the following command.
sudo -u hadoop hadoop fs -chown -R anand:hadoop /user/anand/
This will work.

Reply
Sudhakar says:

December 12, 2013 at 2:17 pm

Hi Amal,

hadoop fs –chmod -R 1777 /app/hadoop/tmp/mapred/staging
Instead of changing permissions like above, can i not set up the hadoop.tmp.dir property itself to use something like /app/hadoop/${user.name}. This will ensure that the temporary data for each user will be stored in his folder under the /app/hadoop folder.

Since the folder does not get created automatically, I can create a directory for each user in the /app/hadoop folder and set appropriate permissions here and NOT in hadoop fs shell.

Will this not work? Let me know what you think.

Thanks.

Reply
1. amalgjose says:
  
  December 12, 2013 at 2:31 pm
  
  Yes, u can set that.. But the problem is that inside the staging directory in hdfs, sub-directories will be automatically created based on the username. . So if u are following the method u explained, the admin have to create this directory structure for every user.
  
  Reply
2. tudor lapusan says:
  
  April 29, 2014 at 11:20 am
  
  Hi Sudhakar,
  It took a long time after you put the comment, but still… 😉
  I’m using hadoop-1.1.2.
  To allow multiple users to run jobs on my cluster, I needed to add write permission for group for the next directories :
  1. hadoop fs –chmod g+w /app/hadoop/tmp/mapred/staging => rwxrwxr-x
  2. sudo chmod g+w tmp_direcory (your local directory from master node) => rwxrwxr-x
  
  In hdfs staging directory (/app/hadoop/tmp/mapred/staging), there is created a new directory with the name of the user that runs the job, so it’s created automatically with the right permissions to run jobs()
  
  Reply
Ashish Dobhal says:

June 16, 2014 at 12:36 pm

Hii Amal:
I am using the hadoop 1.03 version and there is no staging directory inside my temp directory.There’s only a local directory inside it.

Reply
1. amalgjose says:
  
  June 16, 2014 at 12:41 pm
  
  Hi Ashish, please check your hdfs location. I think you checked in local linux file system.
  Please check and reply
  
  Reply
sanjay says:

September 12, 2014 at 3:12 am

Hi ,

I am running hadoop from superuser and while running map-reduce from another user it is not able to connect the modename bcoz in the another machine there are no jps (nodes up) .can u suggest ?

Reply
1. amalgjose says:
  
  September 12, 2014 at 3:55 am
  
  For submitting mapreduce job, it is noy necessary that we need to submit from a machine where hadoop services are running.
  We can submit any jobs from a hadoop client.
  A hadoop client is just the hadoop libraries and proper configuration files.
  For jps, we need java to be installed and added to path.
  Check the hadoop config files in 2nd machine and ensure that it is same as that of 1st machine.
  Clear the datanode storage directory and start the services again.
  Check the log files for details.
  
  Reply
2. amalgjose says:
  
  September 12, 2014 at 3:58 am
  
  Check the config file, core-site.xml and mapred-site.xml, verify the properties fs.default.name and hadoop.job.tracker.
  
  Reply
Riata says:

November 6, 2014 at 3:43 am

Hi thanks for share this… now i got better understanding how users work on hadoop, really well explained …., i recently installed Cloudera 5.1. I cannot find the tmp dir on the directories you mentioned before. Can you help me?

Reply
1. amalgjose says:
  
  November 6, 2014 at 6:23 am
  
  The temp directory is the value of the property hadoop.tmp.dir.
  If you are not setting the value in your config file, check the value in core-default.xml.
  https://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-common/core-default.xml
  
  Reply
bindu says:

December 23, 2014 at 5:18 pm

while am running the command hadoop fs –chmod -R 1777 /app/hadoop/tmp/ its showing error i.e. hadoop command not found

Reply
1. amalgjose says:
  
  December 23, 2014 at 5:33 pm
  
  Hi,
  The reason for this issue is your HADOOP_HOME/bin is not added to the class path.
  Set the following variables in the .bashrc file
  nano ~/.bashrc
  
  export HADOOP_HOME=path to hadoop home directory
  export PATH=$PATH:$HADOOP_HOME/bin
  
  save and exit
  Then type
  source ~/.bashrc
  
  Then type hadoop.
  Else, go to $HADOOP_HOME/bin, then execute the command hadoop fs –chmod -R 1777 /app/hadoop/tmp/
  
  Reply
bindu says:

December 25, 2014 at 9:48 am

ya i had set the path initially. actually i created user called hduser and from that user i installed hadoop.now if i run the chmod command its showing “chmod: Unknown command
Usage: java FsShell ………….”
some syntax it will show and in last line its showing
“The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions] ”
so i used the command bin/ hadoop fs –chmod -R 1777 /app/hadoop/tmp/
now is showing
bash: bin/hadoop: No such file or directory
what exactly wrong i did can u have any idea???
please help me

Reply
1. Doriane says:
  
  April 16, 2015 at 2:29 pm
  
  Hello,
  I also have this problem. I tried bin/hadoop fs -chmod -R 1777 … and it answered chmod unknown command then I tried … fs chmod … and it answered “chmod: Unknown command
  Did you mean -chmod? This command begins with a dash.”
  It’s turning me crazy
  please help us 2
  
  Reply
  1. amalgjose says:
    
    April 17, 2015 at 2:22 pm
    
    Seems like $HADOOP_HOME/bin is not added to the path. Add this to the path. Then try typing hadoop in the commandline. If that command exists, then you can try the command
    hadoop fs -chmod -R 1777
  2. Doriane says:
    
    April 17, 2015 at 2:26 pm
    
    ok amalgjose, I’ll try this, Can you explain the aim of doing this command line ?
  3. amalgjose says:
    
    April 17, 2015 at 2:30 pm
    
    hadoop has chmod command. The syntax is also correct. The problem occurred may be because of the issue with path settings. If you are executing the command like bin/hadoop, you have to execute from the directory location $HADOOP_HOME, else you use simply hadoop.
2. amalgjose says:
  
  April 17, 2015 at 2:20 pm
  
  Seems like hadoop is not added to the path.
  Go to $HADOOP_HOME and execute this command
  
  Reply
sthapar says:

April 30, 2015 at 6:52 pm

Hi Amal, I have a cluster with 4 slaves and 1 master. I am trying to create multiple use accounts but I am not sure if I should do that on each node individually or only on the master?

Reply
1. amalgjose says:
  
  May 1, 2015 at 5:01 am
  
  In a hadoop cluster, ideally users will not be in all the nodes to access the cluster. The access will be given from only the client/edge node. So the users should be created in that node only. In your case, you can make one node as client node and you can create the users in that node.
  
  Reply
sthapar says:

April 30, 2015 at 8:33 pm

Also, since I am using Hadoop 2.6.0, the file path “hadoop fs –chmod -R 1777 /app/hadoop/tmp/mapred/staging” in this command does not exist. Can you tell me what is the corresponding path in version 2.6.0? My hadoop.tmp.dir value is /app/hadoop/tmp.

Reply
1. amalgjose says:
  
  May 1, 2015 at 5:05 am
  
  The location will be {hadoop.tmp.dir}/mapred/staging. Check whether u set the hadoop.tmp.dir in core-site.xml. If not, the directory will be /tmp. This will be created only if you run a mapreduce job.
  
  Reply
rajat says:

May 17, 2015 at 7:04 pm

Hi Amal,
Certainly helpful.Few observations:
a)chmod 1777 should be replaced with chmod 777
b)Second after setting up new user ,I am still able to see all the hdfs files from new user despite changing the permission
rajat@ubuntu:~/hadoop/bin$ ./hadoop fs -ls
Found 1 items
-rw-r–r– 2 rajat hadoop 2033 2015-05-17 14:48 /user/rajat/mapred-queue-acls.xml
rajat@ubuntu:~/hadoop/bin$ hadoop f s-ls /
hadoop: command not found
rajat@ubuntu:~/hadoop/bin$ ./hadoop fs -ls
haFound 1 items
-rw-r–r– 2 rajat hadoop 2033 2015-05-17 14:48 /user/rajat/mapred-queue-acls.xml
rajat@ubuntu:~/hadoop/bin$ hadoop fs -ls /
hadoop: command not found
rajat@ubuntu:~/hadoop/bin$ ./hadoop fs -ls
Found 1 items
-rw-r–r– 2 rajat hadoop 2033 2015-05-17 14:48 /user/rajat/mapred-queue-acls.xml
rajat@ubuntu:~/hadoop/bin$ ./hadoop fs -ls /
Found 7 items
drwxr-xr-x – hduser supergroup 0 2015-05-17 03:43 /amdocs
drwxr-xr-x – hduser supergroup 0 2015-05-17 07:10 /benchmarks
drwx—— – hduser supergroup 0 2015-05-17 14:49 /hbase
drwxr-xr-x – hduser supergroup 0 2015-05-17 06:07 /home
drwxr-xr-x – hduser supergroup 0 2015-05-17 02:39 /system
drwxr-xr-x – hduser supergroup 0 2015-05-10 11:48 /tmp
drwxr-xr-x – hduser supergroup 0 2015-05-17 14:46 /user
rajat@ubuntu:~/hadoop/bin$ ./hadoop fs -ls /hbase/
Found 14 items
.
.
-rw——- 2 hduser supergroup 3 2015-05-09 06:04 /hbase/hbase.version
-rw——- 2 rajat supergroup 2033 2015-05-17 14:49 /hbase/mapred-queue-acls.xml
drwx—— – hduser supergroup 0 2015-05-09 06:33 /hbase/t1
drwx—— – hduser supergroup 0 2015-05-12 09:39 /hbase/test_table
drwx—— – hduser supergroup 0 2015-05-12 09:39 /hbase/test_table_copy
drwx—— – hduser supergroup 0 2015-05-09 14:02 /hbase/users
rajat@ubuntu:~/hadoop/bin$ id
uid=1002(rajat) gid=1001(hadoop) groups=1001(hadoop)
rajat@ubuntu:~/hadoop/bin$

hduser@master:~/hadoop/conf$ id
uid=1001(hduser) gid=1001(hadoop) groups=1001(hadoop)
hduser@master:~/hadoop/conf$

Why am I able to write files from new user-rajat to hbase directory with 700 permission.(group is common -still its 700)

Reply
rajat says:

May 17, 2015 at 7:24 pm

This command has kind of ruined my set up

hadoop fs –chmod -R 777 /home/hduser/hadoop/tmp

Now any x,y,z user with same config files as of hadoop slave node can access and write to any directory of hdfs.
Suugest how to restrict to access to one dorectory only

Reply
1. amalgjose says:
  
  May 18, 2015 at 4:55 am
  
  How come this ruined your set up. It might be because of some other configs done by you. Changing the permission of your staging directory will not cause any problem to any of the user directory. Check your hdfs-site.xml for dfs.permissions.enabled property. By default it will be true. If you set it to false, it may ruin your set up. Lot of people used this config and I myself used this in several places. It was working perfectly.
  This was a post that I posted before 2 years. Lot of things changed after that. So this may not be a perfect solution for now.
  
  Reply
Rajat says:

May 17, 2015 at 7:39 pm

sorted…my dfs.permission in hdfs-site set to true and its cool now.

Reply
upinder says:

August 19, 2015 at 3:56 am

Hi Amal,

I am having issues running mrjob and I tried your suggestion and I am getting permission error. could you give some suggestions ?

hduser@hadoop1:~$ groups hduser
hduser : hadoop sudo

hduser@hadoop1:~$ cat hadoop-2.5.0-cdh5.3.2/etc/hadoop/core-site.xml

fs.defaultFS
hdfs://localhost:9000

hadoop.tmp.dir
/home/hduser/hdata

hduser@hadoop1:~/hadoop-2.5.0-cdh5.3.2/etc/hadoop$ cat hdfs-site.xml

dfs.replication
1

dfs.permissions.enabled
false

hduser@hadoop1:~/hadoop-2.5.0-cdh5.3.2/etc/hadoop$

hduser@hadoop1:~$ hdfs dfs -ls /
Found 2 items
drwxrwxrwx – hduser supergroup 0 2015-08-18 14:17 /inputwords
drwxrwxrwt – hduser supergroup 0 2015-08-18 15:57 /tmp

hduser@hadoop1:~$ ls -l | grep hdata
drwxrwxrwx 4 1777 hadoop 4096 Aug 18 20:44 hdata
hduser@hadoop1:~$

hduser@hadoop1:~$ hadoop fs -chmod -R 1777 /home/hduser/hdata
chmod: `/home/hduser/hdata’: No such file or directory

hduser@hadoop1:~$ sudo python mrjob-0.4/mrjob/examples/mr_word_freq_count.py mrjob-0.4/README.rst -r hadoop –hadoop-bin /home/hduser/hadoop-2.5.0-cdh5.3.2/bin -o hdfs:///tmp
no configs found; falling back on auto-configuration
no configs found; falling back on auto-configuration
creating tmp directory /tmp/mr_word_freq_count.root.20150819.035017.487821
Traceback (most recent call last):
File “mrjob-0.4/mrjob/examples/mr_word_freq_count.py”, line 37, in
MRWordFreqCount.run()
File “/usr/local/lib/python2.7/dist-packages/mrjob-0.4-py2.7.egg/mrjob/job.py”, line 483, in run
mr_job.execute()
File “/usr/local/lib/python2.7/dist-packages/mrjob-0.4-py2.7.egg/mrjob/job.py”, line 501, in execute
super(MRJob, self).execute()
File “/usr/local/lib/python2.7/dist-packages/mrjob-0.4-py2.7.egg/mrjob/launch.py”, line 146, in execute
self.run_job()
File “/usr/local/lib/python2.7/dist-packages/mrjob-0.4-py2.7.egg/mrjob/launch.py”, line 207, in run_job
runner.run()
File “/usr/local/lib/python2.7/dist-packages/mrjob-0.4-py2.7.egg/mrjob/runner.py”, line 450, in run
self._run()
File “/usr/local/lib/python2.7/dist-packages/mrjob-0.4-py2.7.egg/mrjob/hadoop.py”, line 241, in _run
self._upload_local_files_to_hdfs()
File “/usr/local/lib/python2.7/dist-packages/mrjob-0.4-py2.7.egg/mrjob/hadoop.py”, line 267, in _upload_local_files_to_hdfs
self._mkdir_on_hdfs(self._upload_mgr.prefix)
File “/usr/local/lib/python2.7/dist-packages/mrjob-0.4-py2.7.egg/mrjob/hadoop.py”, line 275, in _mkdir_on_hdfs
self.invoke_hadoop([‘fs’, ‘-mkdir’,’-p’, path])
File “/usr/local/lib/python2.7/dist-packages/mrjob-0.4-py2.7.egg/mrjob/fs/hadoop.py”, line 81, in invoke_hadoop
proc = Popen(args, stdout=PIPE, stderr=PIPE)
File “/usr/lib/python2.7/subprocess.py”, line 710, in __init__
errread, errwrite)
File “/usr/lib/python2.7/subprocess.py”, line 1327, in _execute_child
raise child_exception
OSError: [Errno 13] Permission denied
hduser@hadoop1:~$

Reply
1. amalgjose says:
  
  August 19, 2015 at 4:52 am
  
  Don’t keep any application storage inside any user’s home directory. Change the /home/hduser/hdata to /app/hadoop/tmp. Before this you have to create this directory in local as well as in hdfs.
  hadoop fs -mkdir -p /app/hadoop/tmp
  hadoop fs -chmod -R 777 /app/hdaoop/tmp
  sudo mkdir -p /app/hadoop/tmp
  sudo chmod -R 777 /app/hadoop/tmp
  
  Reply
  1. upinder says:
    
    August 20, 2015 at 3:23 am
    
    Thx Amal.
  2. upinder says:
    
    September 2, 2015 at 11:34 pm
    
    Hi Amal,
    I am getting the below error. Not sure why its still giving me an error could you help. :
    
    hduser@hadoop1:~$ hadoop fs -ls /user/training
    Found 1 items
    -rwxrwxr-x 1 hduser supergroup 9838 2015-09-02 11:34 /user/training/top-200.txt
    hduser@hadoop1:~$ hdfs dfs -ls /user/training
    Found 1 items
    -rwxrwxr-x 1 hduser supergroup 9838 2015-09-02 11:34 /user/training/top-200.txt
    hduser@hadoop1:~$
    
    hduser@hadoop1:~$ python Topbugc3count.py -r hadoop -v hdfs:///user/training/top-200.txt
    Deprecated option hdfs_scratch_dir has been renamed to hadoop_tmp_dir
    Unexpected option hdfs_tmp_dir
    looking for configs in /home/hduser/.mrjob.conf
    using configs in /home/hduser/.mrjob.conf
    Active configuration:
    {‘bootstrap_mrjob’: None,
    ‘check_input_paths’: True,
    ‘cleanup’: [‘ALL’],
    ‘cleanup_on_failure’: [‘NONE’],
    ‘cmdenv’: {},
    ‘hadoop_bin’: None,
    ‘hadoop_extra_args’: [],
    ‘hadoop_home’: ‘/usr/local/hadoop’,
    ‘hadoop_streaming_jar’: None,
    ‘hadoop_tmp_dir’: ‘tmp/mrjob’,
    ‘hadoop_version’: ‘0.20’,
    ‘interpreter’: None,
    ‘jobconf’: {},
    ‘label’: None,
    ‘local_tmp_dir’: ‘/tmp’,
    ‘owner’: ‘hduser’,
    ‘python_archives’: [],
    ‘python_bin’: None,
    ‘setup’: [],
    ‘setup_cmds’: [],
    ‘setup_scripts’: [],
    ‘sh_bin’: [‘sh’, ‘-ex’],
    ‘steps_interpreter’: None,
    ‘steps_python_bin’: None,
    ‘strict_protocols’: True,
    ‘upload_archives’: [],
    ‘upload_files’: []}
    Hadoop streaming jar is /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar
    > /usr/local/hadoop/bin/hadoop fs -ls hdfs:///user/training/top-200.txt
    STDOUT: -rwxrwxr-x 1 hduser supergroup 9838 2015-09-02 11:34 hdfs:///user/training/top-200.txt
    STDERR: 15/09/02 15:59:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
    creating tmp directory /tmp/Topbugc3count.hduser.20150902.225937.231068
    archiving /usr/local/lib/python2.7/dist-packages/mrjob-0.5.0_dev-py2.7.egg/mrjob -> /tmp/Topbugc3count.hduser.20150902.225937.231068/mrjob.tar.gz as mrjob/
    writing wrapper script to /tmp/Topbugc3count.hduser.20150902.225937.231068/setup-wrapper.sh
    WRAPPER: # store $PWD
    WRAPPER: __mrjob_PWD=$PWD
    WRAPPER:
    WRAPPER: # obtain exclusive file lock
    WRAPPER: exec 9>/tmp/wrapper.lock.Topbugc3count.hduser.20150902.225937.231068
    WRAPPER: python -c ‘import fcntl; fcntl.flock(9, fcntl.LOCK_EX)’
    WRAPPER:
    WRAPPER: # setup commands
    WRAPPER: {
    WRAPPER: export PYTHONPATH=$__mrjob_PWD/mrjob.tar.gz:$PYTHONPATH
    WRAPPER: } 0&2
    WRAPPER:
    WRAPPER: # release exclusive file lock
    WRAPPER: exec 9>&-
    WRAPPER:
    WRAPPER: # run task from the original working directory
    WRAPPER: cd $__mrjob_PWD
    WRAPPER: “$@”
    Making directory hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/ on HDFS
    > /usr/local/hadoop/bin/hadoop version
    Using Hadoop version 2.6.0
    > /usr/local/hadoop/bin/hadoop fs -mkdir -p hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/
    STDERR: 15/09/02 15:59:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
    Copying local files into hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/
    Uploading /tmp/Topbugc3count.hduser.20150902.225937.231068/setup-wrapper.sh -> hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/setup-wrapper.sh on HDFS
    > /usr/local/hadoop/bin/hadoop fs -put /tmp/Topbugc3count.hduser.20150902.225937.231068/setup-wrapper.sh hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/setup-wrapper.sh
    STDERR: 15/09/02 15:59:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
    Uploading /tmp/Topbugc3count.hduser.20150902.225937.231068/mrjob.tar.gz -> hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/mrjob.tar.gz on HDFS
    > /usr/local/hadoop/bin/hadoop fs -put /tmp/Topbugc3count.hduser.20150902.225937.231068/mrjob.tar.gz hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/mrjob.tar.gz
    STDERR: 15/09/02 15:59:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
    Uploading /home/hduser/Topbugc3count.py -> hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/Topbugc3count.py on HDFS
    > /usr/local/hadoop/bin/hadoop fs -put /home/hduser/Topbugc3count.py hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/Topbugc3count.py
    STDERR: 15/09/02 15:59:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
    > /usr/bin/python /home/hduser/Topbugc3count.py –steps
    running step 1 of 1
    > /usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar -files ‘hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/Topbugc3count.py#Topbugc3count.py,hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/setup-wrapper.sh#setup-wrapper.sh’ -archives ‘hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/mrjob.tar.gz#mrjob.tar.gz’ -input hdfs:///user/training/top-200.txt -output hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/output -mapper ‘sh -ex setup-wrapper.sh python Topbugc3count.py –step-num=0 –mapper’ -combiner ‘sh -ex setup-wrapper.sh python Topbugc3count.py –step-num=0 –combiner’ -reducer ‘sh -ex setup-wrapper.sh python Topbugc3count.py –step-num=0 –reducer’
    HADOOP: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
    HADOOP: session.id is deprecated. Instead, use dfs.metrics.session-id
    HADOOP: Initializing JVM Metrics with processName=JobTracker, sessionId=
    HADOOP: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= – already initialized
    HADOOP: Cleaning up the staging area file:/app/hadoop/tmp/mapred/staging/hduser797418092/.staging/job_local797418092_0001
    HADOOP: Error launching job , bad input path : File does not exist: /app/hadoop/tmp/mapred/staging/hduser797418092/.staging/job_local797418092_0001/archives/mrjob.tar.gz#mrjob.tar.gz
    HADOOP: Streaming Command Failed!
    Job failed with return code 512: [‘/usr/local/hadoop/bin/hadoop’, ‘jar’, ‘/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar’, ‘-files’, ‘hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/Topbugc3count.py#Topbugc3count.py,hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/setup-wrapper.sh#setup-wrapper.sh’, ‘-archives’, ‘hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/mrjob.tar.gz#mrjob.tar.gz’, ‘-input’, ‘hdfs:///user/training/top-200.txt’, ‘-output’, ‘hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/output’, ‘-mapper’, ‘sh -ex setup-wrapper.sh python Topbugc3count.py –step-num=0 –mapper’, ‘-combiner’, ‘sh -ex setup-wrapper.sh python Topbugc3count.py –step-num=0 –combiner’, ‘-reducer’, ‘sh -ex setup-wrapper.sh python Topbugc3count.py –step-num=0 –reducer’]
    Scanning logs for probable cause of failure
    Traceback (most recent call last):
    File “Topbugc3count.py”, line 20, in
    Topbugc3count.run()
    File “/usr/local/lib/python2.7/dist-packages/mrjob-0.5.0_dev-py2.7.egg/mrjob/job.py”, line 433, in run
    mr_job.execute()
    File “/usr/local/lib/python2.7/dist-packages/mrjob-0.5.0_dev-py2.7.egg/mrjob/job.py”, line 451, in execute
    super(MRJob, self).execute()
    File “/usr/local/lib/python2.7/dist-packages/mrjob-0.5.0_dev-py2.7.egg/mrjob/launch.py”, line 160, in execute
    self.run_job()
    File “/usr/local/lib/python2.7/dist-packages/mrjob-0.5.0_dev-py2.7.egg/mrjob/launch.py”, line 227, in run_job
    runner.run()
    File “/usr/local/lib/python2.7/dist-packages/mrjob-0.5.0_dev-py2.7.egg/mrjob/runner.py”, line 452, in run
    self._run()
    File “/usr/local/lib/python2.7/dist-packages/mrjob-0.5.0_dev-py2.7.egg/mrjob/hadoop.py”, line 235, in _run
    self._run_job_in_hadoop()
    File “/usr/local/lib/python2.7/dist-packages/mrjob-0.5.0_dev-py2.7.egg/mrjob/hadoop.py”, line 372, in _run_job_in_hadoop
    raise CalledProcessError(returncode, step_args)
    subprocess.CalledProcessError: Command ‘[‘/usr/local/hadoop/bin/hadoop’, ‘jar’, ‘/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar’, ‘-files’, ‘hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/Topbugc3count.py#Topbugc3count.py,hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/setup-wrapper.sh#setup-wrapper.sh’, ‘-archives’, ‘hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/mrjob.tar.gz#mrjob.tar.gz’, ‘-input’, ‘hdfs:///user/training/top-200.txt’, ‘-output’, ‘hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/output’, ‘-mapper’, ‘sh -ex setup-wrapper.sh python Topbugc3count.py –step-num=0 –mapper’, ‘-combiner’, ‘sh -ex setup-wrapper.sh python Topbugc3count.py –step-num=0 –combiner’, ‘-reducer’, ‘sh -ex setup-wrapper.sh python Topbugc3count.py –step-num=0 –reducer’]’ returned non-zero exit status 512
    hduser@hadoop1:~$
medha2008 says:

October 13, 2015 at 9:02 pm

Hi Amal,

Thanks for the post on security. But, I have a basic question. In this article you are trying to create users on linux hosts where the hadoop cluster was deployed. How do the hadoop cluster recognizes the users on cluster where the users were created on linux host. Where do these users and groups information in stored in the cluster or in it’s configuration.

Thanks
M

Reply
1. amalgjose says:
  
  October 17, 2015 at 4:49 am
  
  We can use linux users or LDAP users as hadoop users. By default hadoop uses linux users. This is not stored anywhere in the hadoop. Whatever users in the client machine with proper access to hdfs and yarn can access hadoop. By default all the users are allowed to access the hadoop. We can restrict the access to certain users/groups also. All the directories and files in hdfs are associated with owner, group and set of permissions. These permissions are stored in the namenode metadata.
  
  Reply
Naveen says:

January 16, 2016 at 7:11 pm

Hi Amal, To access my hdfs I need to kinit first. I have few users who access hdfs through Hue and I want to restrict those users to use hdfs through putty. Is there any way to restrict the user accessing hdfs through linux. (Eg: A script which will allow the user to kinit and the putty has to close automatically so that user can access hdfs through hue)

Reply
1. amalgjose says:
  
  January 17, 2016 at 2:09 am
  
  Hi,
  How did u configured the login.?. Are you using LDAP.? In this case you dont have to worry. While logging in itself, the ticket refresh will happen. If this is not happening, you can create a cronjob to renew the ticket and in this way you can avoid the headache of doing the kinit manually.. Set a constant ticket cache file and set the property in the bashrc of the user.
  
  Reply
Pingback: Step-by-Step Guide to Setting Up an R-Hadoop System | Rhadoop
rebwar says:

February 18, 2016 at 1:02 pm

hduser@hadoopmaster:/$ hadoop fs -chmod -R 777 /app/hdaoop/tmp
Java HotSpot(TM) Client VM warning: You have loaded library /usr/local/hadoop/lib/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It’s highly recommended that you fix the library with ‘execstack -c ‘, or link it with ‘-z noexecstack’.
16/02/18 08:01:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
chmod: `/app/hdaoop/tmp’: No such file or directory

Reply
1. amalgjose says:
  
  February 23, 2016 at 6:51 pm
  
  Seems you made a spelling mistake. It is /app/hadoop/tmp.
  
  Reply
Abhinav says:

March 15, 2016 at 6:47 am

how can i get list of user and groups?

Reply
1. amalgjose says:
  
  March 17, 2016 at 12:45 pm
  
  You cannot list hadoop users and groups. Hadoop users and groups are unix users with hadoop permission. So there is no command to list the users/groups that has access to hadoop.
  
  Reply
ProQuotient says:

January 18, 2017 at 9:03 am

Hadoop has become one of the most useful skills to have for handling big data which is why many people are trying to learn hadoop and tutorials such as this make the learning process much more easier and fun. Thanks for sharing this great article.

Reply