Need for multiple users
In hadoop we run different tasks and store data in HDFS.
If several users are doing tasks using the same user account, it will be difficult to trace the jobs and track the tasks/defects done by each user.
Also the other issue is with the security.
If all are given the same user account, all users will have the same privilege and all can access everyone’s data, can modify it, can perform execution, can delete it also.
This is a very serious issue.
For this we need to create multiple user accounts.
Benefits of Creating multiple users
1) The directories/files of other users cannot be modified by a user.
2) Other users cannot add new files to a user’s directory.
3) Other users cannot perform any tasks (mapreduce etc) on a user’s files.
In short data is safe and is accessible only to the assigned user and the superuser.
Steps for setting up multiple User accounts
For adding new user capable of performing hadoop operations, do the following steps.
Step 1
Creating a New User
For Ubuntu
sudo adduser --ingroup <groupname> <username>
For RedHat variants
useradd -g <groupname> <username> passwd <username>
Then enter the user details and password.
Step 2
we need to change the permission of a directory in HDFS where hadoop stores its temporary data.
Open the core-site.xml file
Find the value of hadoop.tmp.dir.
In my core-site.xml, it is /app/hadoop/tmp. In the proceeding steps, I will be using /app/hadoop/tmp as my directory for storing hadoop data ( ie value of hadoop.tmp.dir).
Then from the superuser account do the following step.
hadoop fs –chmod -R 1777 /app/hadoop/tmp/mapred/staging
Step 3
The next step is to give write permission to our user group on hadoop.tmp.dir (here /app/hadoop/tmp. Open core-site.xml to get the path for hadoop.tmp.dir). This should be done only in the machine(node) where the new user is added.
chmod 777 /app/hadoop/tmp
Step 4
The next step is to create a directory structure in HDFS for the new user.
For that from the superuser, create a directory structure.
Eg: hadoop fs –mkdir /user/username/
Step 5
With this we will not be able to run mapreduce programs, because the ownership of the newly created directory structure is with superuser. So change the ownership of newly created directory in HDFS to the new user.
hadoop fs –chown –R username:groupname <directory to access in HDFS> Eg: hadoop fs –chown –R username:groupname /user/username/
Step 6
login as the new user and perform hadoop jobs..
su – username
Note: Run hadoop tasks in the assigned hdfs paths directory only ie /user/username.
Enjoy…. 🙂
I followed all the steps until this below gives me an error:
hadoop fs -chown -R anand:hadoop /user/anand/
Error:
chown: changing ownership of ‘/user/anand’: Non-super user cannot change owner
Hi Bonjirr,
The error is because you are trying to change the ownership from a non-super user.
As which user you created the directory /user/anand.??
Try
hadoop fs -ls /
and find the owner of /user directory or find the superuser.
Then from the superuser, execute the command
hadoop fs -chown -R anand:hadoop /user/anand/
Eg: If the superuser is hadoop, then from root user execute the following command.
sudo -u hadoop hadoop fs -chown -R anand:hadoop /user/anand/
This will work.
Hi Amal,
hadoop fs –chmod -R 1777 /app/hadoop/tmp/mapred/staging
Instead of changing permissions like above, can i not set up the hadoop.tmp.dir property itself to use something like /app/hadoop/${user.name}. This will ensure that the temporary data for each user will be stored in his folder under the /app/hadoop folder.
Since the folder does not get created automatically, I can create a directory for each user in the /app/hadoop folder and set appropriate permissions here and NOT in hadoop fs shell.
Will this not work? Let me know what you think.
Thanks.
Yes, u can set that.. But the problem is that inside the staging directory in hdfs, sub-directories will be automatically created based on the username. . So if u are following the method u explained, the admin have to create this directory structure for every user.
Hi Sudhakar,
It took a long time after you put the comment, but still… 😉
I’m using hadoop-1.1.2.
To allow multiple users to run jobs on my cluster, I needed to add write permission for group for the next directories :
1. hadoop fs –chmod g+w /app/hadoop/tmp/mapred/staging => rwxrwxr-x
2. sudo chmod g+w tmp_direcory (your local directory from master node) => rwxrwxr-x
In hdfs staging directory (/app/hadoop/tmp/mapred/staging), there is created a new directory with the name of the user that runs the job, so it’s created automatically with the right permissions to run jobs()
Hii Amal:
I am using the hadoop 1.03 version and there is no staging directory inside my temp directory.There’s only a local directory inside it.
Hi Ashish, please check your hdfs location. I think you checked in local linux file system.
Please check and reply
Hi ,
I am running hadoop from superuser and while running map-reduce from another user it is not able to connect the modename bcoz in the another machine there are no jps (nodes up) .can u suggest ?
For submitting mapreduce job, it is noy necessary that we need to submit from a machine where hadoop services are running.
We can submit any jobs from a hadoop client.
A hadoop client is just the hadoop libraries and proper configuration files.
For jps, we need java to be installed and added to path.
Check the hadoop config files in 2nd machine and ensure that it is same as that of 1st machine.
Clear the datanode storage directory and start the services again.
Check the log files for details.
Check the config file, core-site.xml and mapred-site.xml, verify the properties fs.default.name and hadoop.job.tracker.
Hi thanks for share this… now i got better understanding how users work on hadoop, really well explained …., i recently installed Cloudera 5.1. I cannot find the tmp dir on the directories you mentioned before. Can you help me?
The temp directory is the value of the property hadoop.tmp.dir.
If you are not setting the value in your config file, check the value in core-default.xml.
https://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-common/core-default.xml
while am running the command hadoop fs –chmod -R 1777 /app/hadoop/tmp/ its showing error i.e. hadoop command not found
Hi,
The reason for this issue is your HADOOP_HOME/bin is not added to the class path.
Set the following variables in the .bashrc file
nano ~/.bashrc
export HADOOP_HOME=path to hadoop home directory
export PATH=$PATH:$HADOOP_HOME/bin
save and exit
Then type
source ~/.bashrc
Then type hadoop.
Else, go to $HADOOP_HOME/bin, then execute the command hadoop fs –chmod -R 1777 /app/hadoop/tmp/
ya i had set the path initially. actually i created user called hduser and from that user i installed hadoop.now if i run the chmod command its showing “chmod: Unknown command
Usage: java FsShell ………….”
some syntax it will show and in last line its showing
“The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions] ”
so i used the command bin/ hadoop fs –chmod -R 1777 /app/hadoop/tmp/
now is showing
bash: bin/hadoop: No such file or directory
what exactly wrong i did can u have any idea???
please help me
Hello,
I also have this problem. I tried bin/hadoop fs -chmod -R 1777 … and it answered chmod unknown command then I tried … fs chmod … and it answered “chmod: Unknown command
Did you mean -chmod? This command begins with a dash.”
It’s turning me crazy
please help us 2
Seems like $HADOOP_HOME/bin is not added to the path. Add this to the path. Then try typing hadoop in the commandline. If that command exists, then you can try the command
hadoop fs -chmod -R 1777
ok amalgjose, I’ll try this, Can you explain the aim of doing this command line ?
hadoop has chmod command. The syntax is also correct. The problem occurred may be because of the issue with path settings. If you are executing the command like bin/hadoop, you have to execute from the directory location $HADOOP_HOME, else you use simply hadoop.
Seems like hadoop is not added to the path.
Go to $HADOOP_HOME and execute this command
Hi Amal, I have a cluster with 4 slaves and 1 master. I am trying to create multiple use accounts but I am not sure if I should do that on each node individually or only on the master?
In a hadoop cluster, ideally users will not be in all the nodes to access the cluster. The access will be given from only the client/edge node. So the users should be created in that node only. In your case, you can make one node as client node and you can create the users in that node.
Also, since I am using Hadoop 2.6.0, the file path “hadoop fs –chmod -R 1777 /app/hadoop/tmp/mapred/staging” in this command does not exist. Can you tell me what is the corresponding path in version 2.6.0? My hadoop.tmp.dir value is /app/hadoop/tmp.
The location will be {hadoop.tmp.dir}/mapred/staging. Check whether u set the hadoop.tmp.dir in core-site.xml. If not, the directory will be /tmp. This will be created only if you run a mapreduce job.
Hi Amal,
Certainly helpful.Few observations:
a)chmod 1777 should be replaced with chmod 777
b)Second after setting up new user ,I am still able to see all the hdfs files from new user despite changing the permission
rajat@ubuntu:~/hadoop/bin$ ./hadoop fs -ls
Found 1 items
-rw-r–r– 2 rajat hadoop 2033 2015-05-17 14:48 /user/rajat/mapred-queue-acls.xml
rajat@ubuntu:~/hadoop/bin$ hadoop f s-ls /
hadoop: command not found
rajat@ubuntu:~/hadoop/bin$ ./hadoop fs -ls
haFound 1 items
-rw-r–r– 2 rajat hadoop 2033 2015-05-17 14:48 /user/rajat/mapred-queue-acls.xml
rajat@ubuntu:~/hadoop/bin$ hadoop fs -ls /
hadoop: command not found
rajat@ubuntu:~/hadoop/bin$ ./hadoop fs -ls
Found 1 items
-rw-r–r– 2 rajat hadoop 2033 2015-05-17 14:48 /user/rajat/mapred-queue-acls.xml
rajat@ubuntu:~/hadoop/bin$ ./hadoop fs -ls /
Found 7 items
drwxr-xr-x – hduser supergroup 0 2015-05-17 03:43 /amdocs
drwxr-xr-x – hduser supergroup 0 2015-05-17 07:10 /benchmarks
drwx—— – hduser supergroup 0 2015-05-17 14:49 /hbase
drwxr-xr-x – hduser supergroup 0 2015-05-17 06:07 /home
drwxr-xr-x – hduser supergroup 0 2015-05-17 02:39 /system
drwxr-xr-x – hduser supergroup 0 2015-05-10 11:48 /tmp
drwxr-xr-x – hduser supergroup 0 2015-05-17 14:46 /user
rajat@ubuntu:~/hadoop/bin$ ./hadoop fs -ls /hbase/
Found 14 items
.
.
-rw——- 2 hduser supergroup 3 2015-05-09 06:04 /hbase/hbase.version
-rw——- 2 rajat supergroup 2033 2015-05-17 14:49 /hbase/mapred-queue-acls.xml
drwx—— – hduser supergroup 0 2015-05-09 06:33 /hbase/t1
drwx—— – hduser supergroup 0 2015-05-12 09:39 /hbase/test_table
drwx—— – hduser supergroup 0 2015-05-12 09:39 /hbase/test_table_copy
drwx—— – hduser supergroup 0 2015-05-09 14:02 /hbase/users
rajat@ubuntu:~/hadoop/bin$ id
uid=1002(rajat) gid=1001(hadoop) groups=1001(hadoop)
rajat@ubuntu:~/hadoop/bin$
hduser@master:~/hadoop/conf$ id
uid=1001(hduser) gid=1001(hadoop) groups=1001(hadoop)
hduser@master:~/hadoop/conf$
Why am I able to write files from new user-rajat to hbase directory with 700 permission.(group is common -still its 700)
This command has kind of ruined my set up
hadoop fs –chmod -R 777 /home/hduser/hadoop/tmp
Now any x,y,z user with same config files as of hadoop slave node can access and write to any directory of hdfs.
Suugest how to restrict to access to one dorectory only
How come this ruined your set up. It might be because of some other configs done by you. Changing the permission of your staging directory will not cause any problem to any of the user directory. Check your hdfs-site.xml for dfs.permissions.enabled property. By default it will be true. If you set it to false, it may ruin your set up. Lot of people used this config and I myself used this in several places. It was working perfectly.
This was a post that I posted before 2 years. Lot of things changed after that. So this may not be a perfect solution for now.
sorted…my dfs.permission in hdfs-site set to true and its cool now.
Hi Amal,
I am having issues running mrjob and I tried your suggestion and I am getting permission error. could you give some suggestions ?
hduser@hadoop1:~$ groups hduser
hduser : hadoop sudo
hduser@hadoop1:~$ cat hadoop-2.5.0-cdh5.3.2/etc/hadoop/core-site.xml
fs.defaultFS
hdfs://localhost:9000
hadoop.tmp.dir
/home/hduser/hdata
hduser@hadoop1:~/hadoop-2.5.0-cdh5.3.2/etc/hadoop$ cat hdfs-site.xml
dfs.replication
1
dfs.permissions.enabled
false
hduser@hadoop1:~/hadoop-2.5.0-cdh5.3.2/etc/hadoop$
hduser@hadoop1:~$ hdfs dfs -ls /
Found 2 items
drwxrwxrwx – hduser supergroup 0 2015-08-18 14:17 /inputwords
drwxrwxrwt – hduser supergroup 0 2015-08-18 15:57 /tmp
hduser@hadoop1:~$ ls -l | grep hdata
drwxrwxrwx 4 1777 hadoop 4096 Aug 18 20:44 hdata
hduser@hadoop1:~$
hduser@hadoop1:~$ hadoop fs -chmod -R 1777 /home/hduser/hdata
chmod: `/home/hduser/hdata’: No such file or directory
hduser@hadoop1:~$ sudo python mrjob-0.4/mrjob/examples/mr_word_freq_count.py mrjob-0.4/README.rst -r hadoop –hadoop-bin /home/hduser/hadoop-2.5.0-cdh5.3.2/bin -o hdfs:///tmp
no configs found; falling back on auto-configuration
no configs found; falling back on auto-configuration
creating tmp directory /tmp/mr_word_freq_count.root.20150819.035017.487821
Traceback (most recent call last):
File “mrjob-0.4/mrjob/examples/mr_word_freq_count.py”, line 37, in
MRWordFreqCount.run()
File “/usr/local/lib/python2.7/dist-packages/mrjob-0.4-py2.7.egg/mrjob/job.py”, line 483, in run
mr_job.execute()
File “/usr/local/lib/python2.7/dist-packages/mrjob-0.4-py2.7.egg/mrjob/job.py”, line 501, in execute
super(MRJob, self).execute()
File “/usr/local/lib/python2.7/dist-packages/mrjob-0.4-py2.7.egg/mrjob/launch.py”, line 146, in execute
self.run_job()
File “/usr/local/lib/python2.7/dist-packages/mrjob-0.4-py2.7.egg/mrjob/launch.py”, line 207, in run_job
runner.run()
File “/usr/local/lib/python2.7/dist-packages/mrjob-0.4-py2.7.egg/mrjob/runner.py”, line 450, in run
self._run()
File “/usr/local/lib/python2.7/dist-packages/mrjob-0.4-py2.7.egg/mrjob/hadoop.py”, line 241, in _run
self._upload_local_files_to_hdfs()
File “/usr/local/lib/python2.7/dist-packages/mrjob-0.4-py2.7.egg/mrjob/hadoop.py”, line 267, in _upload_local_files_to_hdfs
self._mkdir_on_hdfs(self._upload_mgr.prefix)
File “/usr/local/lib/python2.7/dist-packages/mrjob-0.4-py2.7.egg/mrjob/hadoop.py”, line 275, in _mkdir_on_hdfs
self.invoke_hadoop([‘fs’, ‘-mkdir’,’-p’, path])
File “/usr/local/lib/python2.7/dist-packages/mrjob-0.4-py2.7.egg/mrjob/fs/hadoop.py”, line 81, in invoke_hadoop
proc = Popen(args, stdout=PIPE, stderr=PIPE)
File “/usr/lib/python2.7/subprocess.py”, line 710, in __init__
errread, errwrite)
File “/usr/lib/python2.7/subprocess.py”, line 1327, in _execute_child
raise child_exception
OSError: [Errno 13] Permission denied
hduser@hadoop1:~$
Don’t keep any application storage inside any user’s home directory. Change the /home/hduser/hdata to /app/hadoop/tmp. Before this you have to create this directory in local as well as in hdfs.
hadoop fs -mkdir -p /app/hadoop/tmp
hadoop fs -chmod -R 777 /app/hdaoop/tmp
sudo mkdir -p /app/hadoop/tmp
sudo chmod -R 777 /app/hadoop/tmp
Thx Amal.
Hi Amal,
I am getting the below error. Not sure why its still giving me an error could you help. :
hduser@hadoop1:~$ hadoop fs -ls /user/training
Found 1 items
-rwxrwxr-x 1 hduser supergroup 9838 2015-09-02 11:34 /user/training/top-200.txt
hduser@hadoop1:~$ hdfs dfs -ls /user/training
Found 1 items
-rwxrwxr-x 1 hduser supergroup 9838 2015-09-02 11:34 /user/training/top-200.txt
hduser@hadoop1:~$
hduser@hadoop1:~$ python Topbugc3count.py -r hadoop -v hdfs:///user/training/top-200.txt
Deprecated option hdfs_scratch_dir has been renamed to hadoop_tmp_dir
Unexpected option hdfs_tmp_dir
looking for configs in /home/hduser/.mrjob.conf
using configs in /home/hduser/.mrjob.conf
Active configuration:
{‘bootstrap_mrjob’: None,
‘check_input_paths’: True,
‘cleanup’: [‘ALL’],
‘cleanup_on_failure’: [‘NONE’],
‘cmdenv’: {},
‘hadoop_bin’: None,
‘hadoop_extra_args’: [],
‘hadoop_home’: ‘/usr/local/hadoop’,
‘hadoop_streaming_jar’: None,
‘hadoop_tmp_dir’: ‘tmp/mrjob’,
‘hadoop_version’: ‘0.20’,
‘interpreter’: None,
‘jobconf’: {},
‘label’: None,
‘local_tmp_dir’: ‘/tmp’,
‘owner’: ‘hduser’,
‘python_archives’: [],
‘python_bin’: None,
‘setup’: [],
‘setup_cmds’: [],
‘setup_scripts’: [],
‘sh_bin’: [‘sh’, ‘-ex’],
‘steps_interpreter’: None,
‘steps_python_bin’: None,
‘strict_protocols’: True,
‘upload_archives’: [],
‘upload_files’: []}
Hadoop streaming jar is /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar
> /usr/local/hadoop/bin/hadoop fs -ls hdfs:///user/training/top-200.txt
STDOUT: -rwxrwxr-x 1 hduser supergroup 9838 2015-09-02 11:34 hdfs:///user/training/top-200.txt
STDERR: 15/09/02 15:59:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
creating tmp directory /tmp/Topbugc3count.hduser.20150902.225937.231068
archiving /usr/local/lib/python2.7/dist-packages/mrjob-0.5.0_dev-py2.7.egg/mrjob -> /tmp/Topbugc3count.hduser.20150902.225937.231068/mrjob.tar.gz as mrjob/
writing wrapper script to /tmp/Topbugc3count.hduser.20150902.225937.231068/setup-wrapper.sh
WRAPPER: # store $PWD
WRAPPER: __mrjob_PWD=$PWD
WRAPPER:
WRAPPER: # obtain exclusive file lock
WRAPPER: exec 9>/tmp/wrapper.lock.Topbugc3count.hduser.20150902.225937.231068
WRAPPER: python -c ‘import fcntl; fcntl.flock(9, fcntl.LOCK_EX)’
WRAPPER:
WRAPPER: # setup commands
WRAPPER: {
WRAPPER: export PYTHONPATH=$__mrjob_PWD/mrjob.tar.gz:$PYTHONPATH
WRAPPER: } 0&2
WRAPPER:
WRAPPER: # release exclusive file lock
WRAPPER: exec 9>&-
WRAPPER:
WRAPPER: # run task from the original working directory
WRAPPER: cd $__mrjob_PWD
WRAPPER: “$@”
Making directory hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/ on HDFS
> /usr/local/hadoop/bin/hadoop version
Using Hadoop version 2.6.0
> /usr/local/hadoop/bin/hadoop fs -mkdir -p hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/
STDERR: 15/09/02 15:59:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
Copying local files into hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/
Uploading /tmp/Topbugc3count.hduser.20150902.225937.231068/setup-wrapper.sh -> hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/setup-wrapper.sh on HDFS
> /usr/local/hadoop/bin/hadoop fs -put /tmp/Topbugc3count.hduser.20150902.225937.231068/setup-wrapper.sh hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/setup-wrapper.sh
STDERR: 15/09/02 15:59:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
Uploading /tmp/Topbugc3count.hduser.20150902.225937.231068/mrjob.tar.gz -> hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/mrjob.tar.gz on HDFS
> /usr/local/hadoop/bin/hadoop fs -put /tmp/Topbugc3count.hduser.20150902.225937.231068/mrjob.tar.gz hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/mrjob.tar.gz
STDERR: 15/09/02 15:59:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
Uploading /home/hduser/Topbugc3count.py -> hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/Topbugc3count.py on HDFS
> /usr/local/hadoop/bin/hadoop fs -put /home/hduser/Topbugc3count.py hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/Topbugc3count.py
STDERR: 15/09/02 15:59:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
> /usr/bin/python /home/hduser/Topbugc3count.py –steps
running step 1 of 1
> /usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar -files ‘hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/Topbugc3count.py#Topbugc3count.py,hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/setup-wrapper.sh#setup-wrapper.sh’ -archives ‘hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/mrjob.tar.gz#mrjob.tar.gz’ -input hdfs:///user/training/top-200.txt -output hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/output -mapper ‘sh -ex setup-wrapper.sh python Topbugc3count.py –step-num=0 –mapper’ -combiner ‘sh -ex setup-wrapper.sh python Topbugc3count.py –step-num=0 –combiner’ -reducer ‘sh -ex setup-wrapper.sh python Topbugc3count.py –step-num=0 –reducer’
HADOOP: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
HADOOP: session.id is deprecated. Instead, use dfs.metrics.session-id
HADOOP: Initializing JVM Metrics with processName=JobTracker, sessionId=
HADOOP: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= – already initialized
HADOOP: Cleaning up the staging area file:/app/hadoop/tmp/mapred/staging/hduser797418092/.staging/job_local797418092_0001
HADOOP: Error launching job , bad input path : File does not exist: /app/hadoop/tmp/mapred/staging/hduser797418092/.staging/job_local797418092_0001/archives/mrjob.tar.gz#mrjob.tar.gz
HADOOP: Streaming Command Failed!
Job failed with return code 512: [‘/usr/local/hadoop/bin/hadoop’, ‘jar’, ‘/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar’, ‘-files’, ‘hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/Topbugc3count.py#Topbugc3count.py,hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/setup-wrapper.sh#setup-wrapper.sh’, ‘-archives’, ‘hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/mrjob.tar.gz#mrjob.tar.gz’, ‘-input’, ‘hdfs:///user/training/top-200.txt’, ‘-output’, ‘hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/output’, ‘-mapper’, ‘sh -ex setup-wrapper.sh python Topbugc3count.py –step-num=0 –mapper’, ‘-combiner’, ‘sh -ex setup-wrapper.sh python Topbugc3count.py –step-num=0 –combiner’, ‘-reducer’, ‘sh -ex setup-wrapper.sh python Topbugc3count.py –step-num=0 –reducer’]
Scanning logs for probable cause of failure
Traceback (most recent call last):
File “Topbugc3count.py”, line 20, in
Topbugc3count.run()
File “/usr/local/lib/python2.7/dist-packages/mrjob-0.5.0_dev-py2.7.egg/mrjob/job.py”, line 433, in run
mr_job.execute()
File “/usr/local/lib/python2.7/dist-packages/mrjob-0.5.0_dev-py2.7.egg/mrjob/job.py”, line 451, in execute
super(MRJob, self).execute()
File “/usr/local/lib/python2.7/dist-packages/mrjob-0.5.0_dev-py2.7.egg/mrjob/launch.py”, line 160, in execute
self.run_job()
File “/usr/local/lib/python2.7/dist-packages/mrjob-0.5.0_dev-py2.7.egg/mrjob/launch.py”, line 227, in run_job
runner.run()
File “/usr/local/lib/python2.7/dist-packages/mrjob-0.5.0_dev-py2.7.egg/mrjob/runner.py”, line 452, in run
self._run()
File “/usr/local/lib/python2.7/dist-packages/mrjob-0.5.0_dev-py2.7.egg/mrjob/hadoop.py”, line 235, in _run
self._run_job_in_hadoop()
File “/usr/local/lib/python2.7/dist-packages/mrjob-0.5.0_dev-py2.7.egg/mrjob/hadoop.py”, line 372, in _run_job_in_hadoop
raise CalledProcessError(returncode, step_args)
subprocess.CalledProcessError: Command ‘[‘/usr/local/hadoop/bin/hadoop’, ‘jar’, ‘/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar’, ‘-files’, ‘hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/Topbugc3count.py#Topbugc3count.py,hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/setup-wrapper.sh#setup-wrapper.sh’, ‘-archives’, ‘hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/files/mrjob.tar.gz#mrjob.tar.gz’, ‘-input’, ‘hdfs:///user/training/top-200.txt’, ‘-output’, ‘hdfs:///user/hduser/tmp/mrjob/Topbugc3count.hduser.20150902.225937.231068/output’, ‘-mapper’, ‘sh -ex setup-wrapper.sh python Topbugc3count.py –step-num=0 –mapper’, ‘-combiner’, ‘sh -ex setup-wrapper.sh python Topbugc3count.py –step-num=0 –combiner’, ‘-reducer’, ‘sh -ex setup-wrapper.sh python Topbugc3count.py –step-num=0 –reducer’]’ returned non-zero exit status 512
hduser@hadoop1:~$
Hi Amal,
Thanks for the post on security. But, I have a basic question. In this article you are trying to create users on linux hosts where the hadoop cluster was deployed. How do the hadoop cluster recognizes the users on cluster where the users were created on linux host. Where do these users and groups information in stored in the cluster or in it’s configuration.
Thanks
M
We can use linux users or LDAP users as hadoop users. By default hadoop uses linux users. This is not stored anywhere in the hadoop. Whatever users in the client machine with proper access to hdfs and yarn can access hadoop. By default all the users are allowed to access the hadoop. We can restrict the access to certain users/groups also. All the directories and files in hdfs are associated with owner, group and set of permissions. These permissions are stored in the namenode metadata.
Hi Amal, To access my hdfs I need to kinit first. I have few users who access hdfs through Hue and I want to restrict those users to use hdfs through putty. Is there any way to restrict the user accessing hdfs through linux. (Eg: A script which will allow the user to kinit and the putty has to close automatically so that user can access hdfs through hue)
Hi,
How did u configured the login.?. Are you using LDAP.? In this case you dont have to worry. While logging in itself, the ticket refresh will happen. If this is not happening, you can create a cronjob to renew the ticket and in this way you can avoid the headache of doing the kinit manually.. Set a constant ticket cache file and set the property in the bashrc of the user.
hduser@hadoopmaster:/$ hadoop fs -chmod -R 777 /app/hdaoop/tmp
Java HotSpot(TM) Client VM warning: You have loaded library /usr/local/hadoop/lib/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It’s highly recommended that you fix the library with ‘execstack -c ‘, or link it with ‘-z noexecstack’.
16/02/18 08:01:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
chmod: `/app/hdaoop/tmp’: No such file or directory
Seems you made a spelling mistake. It is /app/hadoop/tmp.
how can i get list of user and groups?
You cannot list hadoop users and groups. Hadoop users and groups are unix users with hadoop permission. So there is no command to list the users/groups that has access to hadoop.
Hadoop has become one of the most useful skills to have for handling big data which is why many people are trying to learn hadoop and tutorials such as this make the learning process much more easier and fun. Thanks for sharing this great article.