Setting Up Multiple Users in Hadoop Clusters

 

 Need for multiple users

In hadoop we run different tasks and store data in  HDFS.

If several users are doing tasks using the same user account, it will be difficult to trace the jobs and track the tasks/defects done by each user.

Also the other issue is with the security.

If all are given the same user account, all users will have the same privilege and all can access everyone’s  data, can modify it, can perform execution, can delete it also.

This is a very serious issue.

For this we need to create multiple user accounts.

Benefits of Creating multiple users

1)      The directories/files of other users cannot be modified by a user.

2)      Other users cannot add new files to a user’s directory.

3)      Other users cannot perform any tasks (mapreduce etc) on a user’s files.

In short data is safe and is accessible only to the assigned user and the superuser.

Steps for setting up multiple User accounts

For adding new user capable of performing hadoop operations, do the following steps.

Step 1

Creating a New User

For Ubuntu

sudo  adduser  --ingroup   <groupname>   <username>

For RedHat variants

useradd  -g <groupname>   <username>

passwd <username>

Then enter the user details and password.

Step 2

we need to change the permission of a directory in HDFS where hadoop stores its temporary data.

Open the core-site.xml file

Find the value of hadoop.tmp.dir.

In my core-site.xml, it is /app/hadoop/tmp. In the proceeding steps, I will be using /app/hadoop/tmp as my directory for storing hadoop data ( ie value of hadoop.tmp.dir).

Then from the superuser account do the following step.

hadoop fs –chmod -R  1777 /app/hadoop/tmp/mapred/staging

Step 3

The next step is to give write permission to our user group on hadoop.tmp.dir (here /app/hadoop/tmp. Open core-site.xml to get the path for hadoop.tmp.dir). This should be done only in the machine(node) where the new user is added.

chmod 777 /app/hadoop/tmp

Step 4

The next step is to create a directory structure in HDFS for the new user.

For that from the superuser, create a directory structure.

Eg: hadoop  fs –mkdir /user/username/

Step 5

With this we will not be able to run mapreduce programs, because the ownership of the newly created directory structure is with superuser. So change the ownership of newly created directory in HDFS  to the new user.

hadoop  fs –chown –R username:groupname   <directory to access in HDFS>

Eg: hadoop fs –chown –R username:groupname  /user/username/

Step 6

login as the new user and perform hadoop jobs..

su  – username

Note: Run hadoop tasks in the assigned hdfs paths directory only ie /user/username.
Enjoy…. 🙂

Advertisements

About amalgjose
I am an Electrical Engineer by qualification, now I am working as a Software Engineer. I am very much interested in Electrical, Electronics, Mechanical and now in Software fields. I like exploring things in these fields. I like travelling, long drives and very much addicted to music.

40 Responses to Setting Up Multiple Users in Hadoop Clusters

  1. bongirr says:

    I followed all the steps until this below gives me an error:
    hadoop fs -chown -R anand:hadoop /user/anand/

    Error:
    chown: changing ownership of ‘/user/anand’: Non-super user cannot change owner

  2. amalgjose says:

    Hi Bonjirr,
    The error is because you are trying to change the ownership from a non-super user.
    As which user you created the directory /user/anand.??
    Try
    hadoop fs -ls /
    and find the owner of /user directory or find the superuser.
    Then from the superuser, execute the command
    hadoop fs -chown -R anand:hadoop /user/anand/
    Eg: If the superuser is hadoop, then from root user execute the following command.
    sudo -u hadoop hadoop fs -chown -R anand:hadoop /user/anand/
    This will work.

  3. Sudhakar says:

    Hi Amal,

    hadoop fs –chmod -R 1777 /app/hadoop/tmp/mapred/staging
    Instead of changing permissions like above, can i not set up the hadoop.tmp.dir property itself to use something like /app/hadoop/${user.name}. This will ensure that the temporary data for each user will be stored in his folder under the /app/hadoop folder.

    Since the folder does not get created automatically, I can create a directory for each user in the /app/hadoop folder and set appropriate permissions here and NOT in hadoop fs shell.

    Will this not work? Let me know what you think.

    Thanks.

    • amalgjose says:

      Yes, u can set that.. But the problem is that inside the staging directory in hdfs, sub-directories will be automatically created based on the username. . So if u are following the method u explained, the admin have to create this directory structure for every user.

    • tudor lapusan says:

      Hi Sudhakar,
      It took a long time after you put the comment, but still… 😉
      I’m using hadoop-1.1.2.
      To allow multiple users to run jobs on my cluster, I needed to add write permission for group for the next directories :
      1. hadoop fs –chmod g+w /app/hadoop/tmp/mapred/staging => rwxrwxr-x
      2. sudo chmod g+w tmp_direcory (your local directory from master node) => rwxrwxr-x

      In hdfs staging directory (/app/hadoop/tmp/mapred/staging), there is created a new directory with the name of the user that runs the job, so it’s created automatically with the right permissions to run jobs()

  4. Ashish Dobhal says:

    Hii Amal:
    I am using the hadoop 1.03 version and there is no staging directory inside my temp directory.There’s only a local directory inside it.

  5. sanjay says:

    Hi ,

    I am running hadoop from superuser and while running map-reduce from another user it is not able to connect the modename bcoz in the another machine there are no jps (nodes up) .can u suggest ?

    • amalgjose says:

      For submitting mapreduce job, it is noy necessary that we need to submit from a machine where hadoop services are running.
      We can submit any jobs from a hadoop client.
      A hadoop client is just the hadoop libraries and proper configuration files.
      For jps, we need java to be installed and added to path.
      Check the hadoop config files in 2nd machine and ensure that it is same as that of 1st machine.
      Clear the datanode storage directory and start the services again.
      Check the log files for details.

    • amalgjose says:

      Check the config file, core-site.xml and mapred-site.xml, verify the properties fs.default.name and hadoop.job.tracker.

  6. Riata says:

    Hi thanks for share this… now i got better understanding how users work on hadoop, really well explained …., i recently installed Cloudera 5.1. I cannot find the tmp dir on the directories you mentioned before. Can you help me?

  7. bindu says:

    while am running the command hadoop fs –chmod -R 1777 /app/hadoop/tmp/ its showing error i.e. hadoop command not found

    • amalgjose says:

      Hi,
      The reason for this issue is your HADOOP_HOME/bin is not added to the class path.
      Set the following variables in the .bashrc file
      nano ~/.bashrc

      export HADOOP_HOME=path to hadoop home directory
      export PATH=$PATH:$HADOOP_HOME/bin

      save and exit
      Then type
      source ~/.bashrc

      Then type hadoop.
      Else, go to $HADOOP_HOME/bin, then execute the command hadoop fs –chmod -R 1777 /app/hadoop/tmp/

  8. bindu says:

    ya i had set the path initially. actually i created user called hduser and from that user i installed hadoop.now if i run the chmod command its showing “chmod: Unknown command
    Usage: java FsShell ………….”
    some syntax it will show and in last line its showing
    “The general command line syntax is
    bin/hadoop command [genericOptions] [commandOptions] ”
    so i used the command bin/ hadoop fs –chmod -R 1777 /app/hadoop/tmp/
    now is showing
    bash: bin/hadoop: No such file or directory
    what exactly wrong i did can u have any idea???
    please help me

    • Doriane says:

      Hello,
      I also have this problem. I tried bin/hadoop fs -chmod -R 1777 … and it answered chmod unknown command then I tried … fs chmod … and it answered “chmod: Unknown command
      Did you mean -chmod? This command begins with a dash.”
      It’s turning me crazy
      please help us 2

      • amalgjose says:

        Seems like $HADOOP_HOME/bin is not added to the path. Add this to the path. Then try typing hadoop in the commandline. If that command exists, then you can try the command
        hadoop fs -chmod -R 1777

      • Doriane says:

        ok amalgjose, I’ll try this, Can you explain the aim of doing this command line ?

      • amalgjose says:

        hadoop has chmod command. The syntax is also correct. The problem occurred may be because of the issue with path settings. If you are executing the command like bin/hadoop, you have to execute from the directory location $HADOOP_HOME, else you use simply hadoop.

    • amalgjose says:

      Seems like hadoop is not added to the path.
      Go to $HADOOP_HOME and execute this command

  9. sthapar says:

    Hi Amal, I have a cluster with 4 slaves and 1 master. I am trying to create multiple use accounts but I am not sure if I should do that on each node individually or only on the master?

    • amalgjose says:

      In a hadoop cluster, ideally users will not be in all the nodes to access the cluster. The access will be given from only the client/edge node. So the users should be created in that node only. In your case, you can make one node as client node and you can create the users in that node.

  10. sthapar says:

    Also, since I am using Hadoop 2.6.0, the file path “hadoop fs –chmod -R 1777 /app/hadoop/tmp/mapred/staging” in this command does not exist. Can you tell me what is the corresponding path in version 2.6.0? My hadoop.tmp.dir value is /app/hadoop/tmp.

    • amalgjose says:

      The location will be {hadoop.tmp.dir}/mapred/staging. Check whether u set the hadoop.tmp.dir in core-site.xml. If not, the directory will be /tmp. This will be created only if you run a mapreduce job.

  11. rajat says:

    This command has kind of ruined my set up

    hadoop fs –chmod -R 777 /home/hduser/hadoop/tmp

    Now any x,y,z user with same config files as of hadoop slave node can access and write to any directory of hdfs.
    Suugest how to restrict to access to one dorectory only

    • amalgjose says:

      How come this ruined your set up. It might be because of some other configs done by you. Changing the permission of your staging directory will not cause any problem to any of the user directory. Check your hdfs-site.xml for dfs.permissions.enabled property. By default it will be true. If you set it to false, it may ruin your set up. Lot of people used this config and I myself used this in several places. It was working perfectly.
      This was a post that I posted before 2 years. Lot of things changed after that. So this may not be a perfect solution for now.

  12. upinder says:

    Hi Amal,

    I am having issues running mrjob and I tried your suggestion and I am getting permission error. could you give some suggestions ?

    hduser@hadoop1:~$ groups hduser
    hduser : hadoop sudo

    hduser@hadoop1:~$ cat hadoop-2.5.0-cdh5.3.2/etc/hadoop/core-site.xml

    fs.defaultFS
    hdfs://localhost:9000

    hadoop.tmp.dir
    /home/hduser/hdata

    hduser@hadoop1:~/hadoop-2.5.0-cdh5.3.2/etc/hadoop$ cat hdfs-site.xml

    dfs.replication
    1

    dfs.permissions.enabled
    false

    hduser@hadoop1:~/hadoop-2.5.0-cdh5.3.2/etc/hadoop$

    hduser@hadoop1:~$ hdfs dfs -ls /
    Found 2 items
    drwxrwxrwx – hduser supergroup 0 2015-08-18 14:17 /inputwords
    drwxrwxrwt – hduser supergroup 0 2015-08-18 15:57 /tmp

    hduser@hadoop1:~$ ls -l | grep hdata
    drwxrwxrwx 4 1777 hadoop 4096 Aug 18 20:44 hdata
    hduser@hadoop1:~$

    hduser@hadoop1:~$ hadoop fs -chmod -R 1777 /home/hduser/hdata
    chmod: `/home/hduser/hdata’: No such file or directory

    hduser@hadoop1:~$ sudo python mrjob-0.4/mrjob/examples/mr_word_freq_count.py mrjob-0.4/README.rst -r hadoop –hadoop-bin /home/hduser/hadoop-2.5.0-cdh5.3.2/bin -o hdfs:///tmp
    no configs found; falling back on auto-configuration
    no configs found; falling back on auto-configuration
    creating tmp directory /tmp/mr_word_freq_count.root.20150819.035017.487821
    Traceback (most recent call last):
    File “mrjob-0.4/mrjob/examples/mr_word_freq_count.py”, line 37, in
    MRWordFreqCount.run()
    File “/usr/local/lib/python2.7/dist-packages/mrjob-0.4-py2.7.egg/mrjob/job.py”, line 483, in run
    mr_job.execute()
    File “/usr/local/lib/python2.7/dist-packages/mrjob-0.4-py2.7.egg/mrjob/job.py”, line 501, in execute
    super(MRJob, self).execute()
    File “/usr/local/lib/python2.7/dist-packages/mrjob-0.4-py2.7.egg/mrjob/launch.py”, line 146, in execute
    self.run_job()
    File “/usr/local/lib/python2.7/dist-packages/mrjob-0.4-py2.7.egg/mrjob/launch.py”, line 207, in run_job
    runner.run()
    File “/usr/local/lib/python2.7/dist-packages/mrjob-0.4-py2.7.egg/mrjob/runner.py”, line 450, in run
    self._run()
    File “/usr/local/lib/python2.7/dist-packages/mrjob-0.4-py2.7.egg/mrjob/hadoop.py”, line 241, in _run
    self._upload_local_files_to_hdfs()
    File “/usr/local/lib/python2.7/dist-packages/mrjob-0.4-py2.7.egg/mrjob/hadoop.py”, line 267, in _upload_local_files_to_hdfs
    self._mkdir_on_hdfs(self._upload_mgr.prefix)
    File “/usr/local/lib/python2.7/dist-packages/mrjob-0.4-py2.7.egg/mrjob/hadoop.py”, line 275, in _mkdir_on_hdfs
    self.invoke_hadoop([‘fs’, ‘-mkdir’,’-p’, path])
    File “/usr/local/lib/python2.7/dist-packages/mrjob-0.4-py2.7.egg/mrjob/fs/hadoop.py”, line 81, in invoke_hadoop
    proc = Popen(args, stdout=PIPE, stderr=PIPE)
    File “/usr/lib/python2.7/subprocess.py”, line 710, in __init__
    errread, errwrite)
    File “/usr/lib/python2.7/subprocess.py”, line 1327, in _execute_child
    raise child_exception
    OSError: [Errno 13] Permission denied
    hduser@hadoop1:~$

    • amalgjose says:

      Don’t keep any application storage inside any user’s home directory. Change the /home/hduser/hdata to /app/hadoop/tmp. Before this you have to create this directory in local as well as in hdfs.
      hadoop fs -mkdir -p /app/hadoop/tmp
      hadoop fs -chmod -R 777 /app/hdaoop/tmp
      sudo mkdir -p /app/hadoop/tmp
      sudo chmod -R 777 /app/hadoop/tmp

      • upinder says:

        Hi Amal,
        I am getting the below error. Not sure why its still giving me an error could you help. :

        hduser@hadoop1:~$ hadoop fs -ls /user/training
        Found 1 items
        -rwxrwxr-x 1 hduser supergroup 9838 2015-09-02 11:34 /user/training/top-200.txt
        hduser@hadoop1:~$ hdfs dfs -ls /user/training
        Found 1 items
        -rwxrwxr-x 1 hduser supergroup 9838 2015-09-02 11:34 /user/training/top-200.txt
        hduser@hadoop1:~$

        hduser@hadoop1:~$ python Topbugc3count.py -r hadoop -v hdfs:///user/training/top-200.txt
        Deprecated option hdfs_scratch_dir has been renamed to hadoop_tmp_dir
        Unexpected option hdfs_tmp_dir
        looking for configs in /home/hduser/.mrjob.conf
        using configs in /home/hduser/.mrjob.conf
        Active configuration:
        {‘bootstrap_mrjob’: None,
        ‘check_input_paths’: True,
        ‘cleanup’: [‘ALL’],
        ‘cleanup_on_failure’: [‘NONE’],
        ‘cmdenv’: {},
        ‘hadoop_bin’: None,
        ‘hadoop_extra_args’: [],
        ‘hadoop_home’: ‘/usr/local/hadoop’,
        ‘hadoop_streaming_jar’: None,
        ‘hadoop_tmp_dir’: ‘tmp/mrjob’,
        ‘hadoop_version’: ‘0.20’,
        ‘interpreter’: None,
        ‘jobconf’: {},
        ‘label’: None,
        ‘local_tmp_dir’: ‘/tmp’,
        ‘owner’: ‘hduser’,
        ‘python_archives’: [],
        ‘python_bin’: None,
        ‘setup’: [],
        ‘setup_cmds’: [],
        ‘setup_scripts’: [],
        ‘sh_bin’: [‘sh’, ‘-ex’],
        ‘steps_interpreter’: None,
        ‘steps_python_bin’: None,
        ‘strict_protocols’: True,
        ‘upload_archives’: [],
        ‘upload_files’: []}
        Hadoop streaming jar is /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar
        > /usr/local/hadoop/bin/hadoop fs -ls hdfs:///user/training/top-200.txt
        STDOUT: -rwxrwxr-x 1 hduser supergroup 9838 2015-09-02 11:34 hdfs:///user/training/top-200.txt
        STDERR: 15/09/02 15:59:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
        creating tmp directory /tmp/Topbugc3count.hduser.20150902.225937.231068
        archiving /usr/local/lib/python2.7/dist-packages/mrjob-0.5.0_dev-py2.7.egg/mrjob -> /tmp/Topbugc3count.hduser.20150902.225937.231068/mrjob.tar.gz as mrjob/
        writing wrapper script to /tmp/Topbugc3count.hduser.20150902.225937.231068/setup-wrapper.sh
        WRAPPER: # store $PWD
        WRAPPER: __mrjob_PWD=$PWD
        WRAPPER:
        WRAPPER: # obtain exclusive file lock
        WRAPPER: exec 9>/tmp/wrapper.lock.Topbugc3count.hduser.20150902.225937.231068
        WRAPPER: python -c ‘import fcntl; fcntl.flock(9, fcntl.LOCK_EX)’
        WRAPPER:
        WRAPPER: # setup commands
        WRAPPER: {
        WRAPPER: export PYTHONPATH=$__mrjob_PWD/mrjob.tar.gz:$PYTHONPATH
        WRAPPER: } 0&2
        WRAPPER:
        WRAPPER: # release exclusive file lock
        WRAPPER: exec 9>&-
        WRAPPER:
        WRAPPER: # run task from the original working directory
        WRAPPER: cd $__mrjob_PWD
        WRAPPER: “$@”
        Making directory hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/ on HDFS
        > /usr/local/hadoop/bin/hadoop version
        Using Hadoop version 2.6.0
        > /usr/local/hadoop/bin/hadoop fs -mkdir -p hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/
        STDERR: 15/09/02 15:59:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
        Copying local files into hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/
        Uploading /tmp/Topbugc3count.hduser.20150902.225937.231068/setup-wrapper.sh -> hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/setup-wrapper.sh on HDFS
        > /usr/local/hadoop/bin/hadoop fs -put /tmp/Topbugc3count.hduser.20150902.225937.231068/setup-wrapper.sh hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/setup-wrapper.sh
        STDERR: 15/09/02 15:59:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
        Uploading /tmp/Topbugc3count.hduser.20150902.225937.231068/mrjob.tar.gz -> hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/mrjob.tar.gz on HDFS
        > /usr/local/hadoop/bin/hadoop fs -put /tmp/Topbugc3count.hduser.20150902.225937.231068/mrjob.tar.gz hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/mrjob.tar.gz
        STDERR: 15/09/02 15:59:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
        Uploading /home/hduser/Topbugc3count.py -> hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/Topbugc3count.py on HDFS
        > /usr/local/hadoop/bin/hadoop fs -put /home/hduser/Topbugc3count.py hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/Topbugc3count.py
        STDERR: 15/09/02 15:59:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
        > /usr/bin/python /home/hduser/Topbugc3count.py –steps
        running step 1 of 1
        > /usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar -files ‘hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/Topbugc3count.py#Topbugc3count.py,hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/setup-wrapper.sh#setup-wrapper.sh’ -archives ‘hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/mrjob.tar.gz#mrjob.tar.gz’ -input hdfs:///user/training/top-200.txt -output hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/output -mapper ‘sh -ex setup-wrapper.sh python Topbugc3count.py –step-num=0 –mapper’ -combiner ‘sh -ex setup-wrapper.sh python Topbugc3count.py –step-num=0 –combiner’ -reducer ‘sh -ex setup-wrapper.sh python Topbugc3count.py –step-num=0 –reducer’
        HADOOP: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
        HADOOP: session.id is deprecated. Instead, use dfs.metrics.session-id
        HADOOP: Initializing JVM Metrics with processName=JobTracker, sessionId=
        HADOOP: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= – already initialized
        HADOOP: Cleaning up the staging area file:/app/hadoop/tmp/mapred/staging/hduser797418092/.staging/job_local797418092_0001
        HADOOP: Error launching job , bad input path : File does not exist: /app/hadoop/tmp/mapred/staging/hduser797418092/.staging/job_local797418092_0001/archives/mrjob.tar.gz#mrjob.tar.gz
        HADOOP: Streaming Command Failed!
        Job failed with return code 512: [‘/usr/local/hadoop/bin/hadoop’, ‘jar’, ‘/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar’, ‘-files’, ‘hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/Topbugc3count.py#Topbugc3count.py,hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/setup-wrapper.sh#setup-wrapper.sh’, ‘-archives’, ‘hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/mrjob.tar.gz#mrjob.tar.gz’, ‘-input’, ‘hdfs:///user/training/top-200.txt’, ‘-output’, ‘hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/output’, ‘-mapper’, ‘sh -ex setup-wrapper.sh python Topbugc3count.py –step-num=0 –mapper’, ‘-combiner’, ‘sh -ex setup-wrapper.sh python Topbugc3count.py –step-num=0 –combiner’, ‘-reducer’, ‘sh -ex setup-wrapper.sh python Topbugc3count.py –step-num=0 –reducer’]
        Scanning logs for probable cause of failure
        Traceback (most recent call last):
        File “Topbugc3count.py”, line 20, in
        Topbugc3count.run()
        File “/usr/local/lib/python2.7/dist-packages/mrjob-0.5.0_dev-py2.7.egg/mrjob/job.py”, line 433, in run
        mr_job.execute()
        File “/usr/local/lib/python2.7/dist-packages/mrjob-0.5.0_dev-py2.7.egg/mrjob/job.py”, line 451, in execute
        super(MRJob, self).execute()
        File “/usr/local/lib/python2.7/dist-packages/mrjob-0.5.0_dev-py2.7.egg/mrjob/launch.py”, line 160, in execute
        self.run_job()
        File “/usr/local/lib/python2.7/dist-packages/mrjob-0.5.0_dev-py2.7.egg/mrjob/launch.py”, line 227, in run_job
        runner.run()
        File “/usr/local/lib/python2.7/dist-packages/mrjob-0.5.0_dev-py2.7.egg/mrjob/runner.py”, line 452, in run
        self._run()
        File “/usr/local/lib/python2.7/dist-packages/mrjob-0.5.0_dev-py2.7.egg/mrjob/hadoop.py”, line 235, in _run
        self._run_job_in_hadoop()
        File “/usr/local/lib/python2.7/dist-packages/mrjob-0.5.0_dev-py2.7.egg/mrjob/hadoop.py”, line 372, in _run_job_in_hadoop
        raise CalledProcessError(returncode, step_args)
        subprocess.CalledProcessError: Command ‘[‘/usr/local/hadoop/bin/hadoop’, ‘jar’, ‘/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar’, ‘-files’, ‘hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/Topbugc3count.py#Topbugc3count.py,hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/setup-wrapper.sh#setup-wrapper.sh’, ‘-archives’, ‘hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/mrjob.tar.gz#mrjob.tar.gz’, ‘-input’, ‘hdfs:///user/training/top-200.txt’, ‘-output’, ‘hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/output’, ‘-mapper’, ‘sh -ex setup-wrapper.sh python Topbugc3count.py –step-num=0 –mapper’, ‘-combiner’, ‘sh -ex setup-wrapper.sh python Topbugc3count.py –step-num=0 –combiner’, ‘-reducer’, ‘sh -ex setup-wrapper.sh python Topbugc3count.py –step-num=0 –reducer’]’ returned non-zero exit status 512
        hduser@hadoop1:~$

  13. medha2008 says:

    Hi Amal,

    Thanks for the post on security. But, I have a basic question. In this article you are trying to create users on linux hosts where the hadoop cluster was deployed. How do the hadoop cluster recognizes the users on cluster where the users were created on linux host. Where do these users and groups information in stored in the cluster or in it’s configuration.

    Thanks
    M

    • amalgjose says:

      We can use linux users or LDAP users as hadoop users. By default hadoop uses linux users. This is not stored anywhere in the hadoop. Whatever users in the client machine with proper access to hdfs and yarn can access hadoop. By default all the users are allowed to access the hadoop. We can restrict the access to certain users/groups also. All the directories and files in hdfs are associated with owner, group and set of permissions. These permissions are stored in the namenode metadata.

  14. Naveen says:

    Hi Amal, To access my hdfs I need to kinit first. I have few users who access hdfs through Hue and I want to restrict those users to use hdfs through putty. Is there any way to restrict the user accessing hdfs through linux. (Eg: A script which will allow the user to kinit and the putty has to close automatically so that user can access hdfs through hue)

    • amalgjose says:

      Hi,
      How did u configured the login.?. Are you using LDAP.? In this case you dont have to worry. While logging in itself, the ticket refresh will happen. If this is not happening, you can create a cronjob to renew the ticket and in this way you can avoid the headache of doing the kinit manually.. Set a constant ticket cache file and set the property in the bashrc of the user.

  15. Pingback: Step-by-Step Guide to Setting Up an R-Hadoop System | Rhadoop

  16. rebwar says:

    hduser@hadoopmaster:/$ hadoop fs -chmod -R 777 /app/hdaoop/tmp
    Java HotSpot(TM) Client VM warning: You have loaded library /usr/local/hadoop/lib/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
    It’s highly recommended that you fix the library with ‘execstack -c ‘, or link it with ‘-z noexecstack’.
    16/02/18 08:01:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
    chmod: `/app/hdaoop/tmp’: No such file or directory

  17. Abhinav says:

    how can i get list of user and groups?

    • amalgjose says:

      You cannot list hadoop users and groups. Hadoop users and groups are unix users with hadoop permission. So there is no command to list the users/groups that has access to hadoop.

  18. ProQuotient says:

    Hadoop has become one of the most useful skills to have for handling big data which is why many people are trying to learn hadoop and tutorials such as this make the learning process much more easier and fun. Thanks for sharing this great article.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: