oozie和hadoop交互时的用户管理问题

来源:互联网 发布:网络空间安全专业待遇 编辑:程序博客网 时间:2024/06/17 02:39

oozie和hadoop交互时的用户管理问题

在沙箱环境碰到一个问题排障花了挺多时间,本质上还是自己对hadoop用户管理不熟悉!


现象

异常:

java.lang.ArrayIndexOutOfBoundsException: 0

错误日志

2017-06-10 20:24:06,816  WARN ActionStartXCommand:523 - SERVER[NM-304-SA5212M4-BIGDATA-623] USER[Delta201702_02] GROUP[-] TOKEN[] APP[My_Workflow] JOB[0000006-170610193325422-oozie-oozi-W] ACTION[0000006-170610193325422-oozie-oozi-W@shell-4a1e] Error starting action [shell-4a1e]. ErrorType [ERROR], ErrorCode [ArrayIndexOutOfBoundsException], Message [ArrayIndexOutOfBoundsException: 0]org.apache.oozie.action.ActionExecutorException: ArrayIndexOutOfBoundsException: 0    at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:445)    at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1008)    at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1162)    at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:234)    at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:64)    at org.apache.oozie.command.XCommand.call(XCommand.java:286)    at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:321)    at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:250)    at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)    at java.lang.Thread.run(Thread.java:745)Caused by: java.lang.ArrayIndexOutOfBoundsException: 0    at org.apache.hadoop.fs.viewfs.ViewFileSystem$InternalDirOfViewFs.getFileStatus(ViewFileSystem.java:771)    at org.apache.hadoop.fs.viewfs.ViewFileSystem.getFileStatus(ViewFileSystem.java:359)    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.checkPermissionOfOther(ClientDistributedCacheManager.java:275)    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.ancestorsHaveExecutePermissions(ClientDistributedCacheManager.java:256)    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.isPublic(ClientDistributedCacheManager.java:243)    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineCacheVisibilities(ClientDistributedCacheManager.java:162)    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:58)    at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:269)    at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:390)    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:483)    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1306)    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1303)    at java.security.AccessController.doPrivileged(Native Method)    at javax.security.auth.Subject.doAs(Subject.java:415)    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1303)    at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:564)    at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:559)    at java.security.AccessController.doPrivileged(Native Method)    at javax.security.auth.Subject.doAs(Subject.java:415)    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:559)    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:550)    at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:993)    ... 10 more

排障思路:

刚开始看到这个错误,基本没思路,这个异常提供的信息太少了
1. 网上查找相关错误 (未找到相关问题,oozie相关的东西网络上分享确实太少)
2. 根据异常栈查看源码
org.apache.hadoop.fs.viewfs.ViewFileSystem$InternalDirOfViewFs.getFileStatus:

@Overridepublic FileStatus getFileStatus(Path f) throws IOException {  checkPathIsSlash(f);  return new FileStatus(0, true, 0, 0, creationTime, creationTime,      PERMISSION_555, ugi.getUserName(), ugi.getGroupNames()[0],      new Path(theInternalDir.fullPath).makeQualified(          myUri, ROOT_PATH));}

根据异常的信息,java.lang.ArrayIndexOutOfBoundsException: 0
基本上断定就是这个的原因了:ugi.getGroupNames()[0]
问题就是ugi拿不到group信息

hdfs 用户管理:
hdfs用户信息可以来自客户端的配置文件core-site.xml也可以来自当前操作系统的登录用户,但是优先考虑配置文件core-site.xml中的用户信息,对应的配置项为:Hadoop.job.ugi,这个配置项的值可以是一个用户名和一个用户组,或者是一个用户名和多个用户组。如果配置文件没有配置登录用户信息的话,就会自动执行系统的shell命令whoami、bash -c groups来分别获取当前系统的用户名和用户所属的组

原因基本就定位到了:提交oozie任务的用户,并没有用户信息。所以luancher job报错


解决办法

在oozie服务器上创建相关的用户后,任务正常运行

遗留问题:

1 按这种方法,每增加一个用户,oozie服务器就必须同步一个用户,并且oozie HA模式下多个server都必须同步
2 oozie 作为super user可以代理其他用户通过验证kerberos,但是执行任务的用户还是提交任务的用户
3 如果在core-site.xml中配置一个oozie用户,那每个提交用户如何区分,yarn那边跑任务的用户就会统一都认为是oozie

原创粉丝点击