mapreduce系列(3)----在window端远程提交mr程序运行
来源:互联网 发布:淘宝网店团队人员 编辑:程序博客网 时间:2024/06/08 10:06
之前讲到windows上跑本地版的mapreduce程序,毫无问题,
但是更进一步,我现在想直接把我的idea上的程序运行在linunx集群上,这样,我的本地就相当于是mapreduce的一个客户端了。
沿着这个思路,我们直接把conf配置如下设置:
conf.set("mapreduce.framework.name","yarn");conf.set("yarn.resourcemanager.hostname","mini01");conf.set("fs.defaultFS","hdfs://mini01:9000/");
运行,发下如下错误:
17/03/17 19:02:22 INFO client.RMProxy: Connecting to ResourceManager at mini01/192.168.153.11:803217/03/17 19:02:22 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.17/03/17 19:02:22 WARN mapreduce.JobResourceUploader: No job jar file set. User classes may not be found. See Job or Job#setJar(String).17/03/17 19:02:22 INFO input.FileInputFormat: Total input paths to process : 117/03/17 19:02:22 INFO mapreduce.JobSubmitter: number of splits:117/03/17 19:02:23 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1489496419130_000217/03/17 19:02:37 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.17/03/17 19:05:18 INFO impl.YarnClientImpl: Submitted application application_1489496419130_000217/03/17 19:09:19 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/root/.staging/job_1489496419130_0002Exception in thread "main" java.io.IOException: Failed to run job : Application application_1489496419130_0002 failed 2 times due to AM Container for appattempt_1489496419130_0002_000002 exited with exitCode: 1For more detailed output, check application tracking page:http://mini01:8088/proxy/application_1489496419130_0002/Then, click on links to logs of each attempt.Diagnostics: Exception from container-launch.Container id: container_1489496419130_0002_02_000001Exit code: 1Exception message: /bin/bash: line 0: fg: no job controlStack trace: ExitCodeException exitCode=1: /bin/bash: line 0: fg: no job control at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744)Container exited with a non-zero exit code 1Failing this attempt. Failing the application. at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:301) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:241) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1315) at wc.WordCountRunner.main(WordCountRunner.java:78)
可以知道是在YarnRunner中shell脚本导致的错误。
单步跟踪源码到YarnRunner中的submitJob()中
// Construct necessary information to start the MR AM ApplicationSubmissionContext appContext =createApplicationSubmissionContext(conf, jobSubmitDir, ts);
appContext的信息如下:
application_id { id: 2 cluster_timestamp: 1489496419130 } application_name: "N/A" queue: "default" am_container_spec { localResources { key: "jobSubmitDir/job.splitmetainfo" value { resource { scheme: "hdfs" host: "mini01" port: 9000 file: "/tmp/hadoop-yarn/staging/root/.staging/job_1489496419130_0002/job.splitmetainfo" } size: 27 timestamp: 1489698635869 type: FILE visibility: APPLICATION } } localResources { key: "jobSubmitDir/job.split" value { resource { scheme: "hdfs" host: "mini01" port: 9000 file: "/tmp/hadoop-yarn/staging/root/.staging/job_1489496419130_0002/job.split" } size: 112 timestamp: 1489698635836 type: FILE visibility: APPLICATION } } localResources { key: "job.xml" value { resource { scheme: "hdfs" host: "mini01" port: 9000 file: "/tmp/hadoop-yarn/staging/root/.staging/job_1489496419130_0002/job.xml" } size: 88715 timestamp: 1489698636066 type: FILE visibility: APPLICATION } } tokens: "HDTS\000\000\001\025MapReduceShuffleToken\b\213\023`\302+\213\302`" environment { key: "HADOOP_CLASSPATH" value: "%PWD%;job.jar/job.jar;job.jar/classes/;job.jar/lib/*;%PWD%/*;null" } environment { key: "SHELL" value: "/bin/bash" } environment { key: "CLASSPATH" value: "%PWD%;%HADOOP_CONF_DIR%;%HADOOP_COMMON_HOME%/share/hadoop/common/*;%HADOOP_COMMON_HOME%/share/hadoop/common/lib/*;%HADOOP_HDFS_HOME%/share/hadoop/hdfs/*;%HADOOP_HDFS_HOME%/share/hadoop/hdfs/lib/*;%HADOOP_YARN_HOME%/share/hadoop/yarn/*;%HADOOP_YARN_HOME%/share/hadoop/yarn/lib/*;%HADOOP_MAPRED_HOME%\\share\\hadoop\\mapreduce\\*;%HADOOP_MAPRED_HOME%\\share\\hadoop\\mapreduce\\lib\\*;job.jar/job.jar;job.jar/classes/;job.jar/lib/*;%PWD%/*" } environment { key: "LD_LIBRARY_PATH" value: "%PWD%" } command: "%JAVA_HOME%/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=<LOG_DIR> -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1><LOG_DIR>/stdout 2><LOG_DIR>/stderr " application_ACLs { accessType: APPACCESS_VIEW_APP acl: " " } application_ACLs { accessType: APPACCESS_MODIFY_APP acl: " " } } cancel_tokens_when_complete: true maxAppAttempts: 2 resource { memory: 1536 virtual_cores: 1 } applicationType: "MAPREDUCE"
可以看到是把windows上的路径上的”%”和”;”传到linux上了,所以只要这个类在拷贝到自己的工程中修改路径即可(包名和路径名不能有任何变化)
把org.apache.hadoop.mapred.YARNRunner.java
文件原封不动的拷贝到自己的src下,包名和路径名不能有任何变化。
修改如下几个地方即可:
第一处:
// Setup the command to run the AMList<String> vargs = new ArrayList<String>(8);//TODO:注释一下代码//vargs.add(MRApps.crossPlatformifyMREnv(jobConf, Environment.JAVA_HOME) + "/bin/java"); //TODO: tianjun修改的源码 System.out.println(MRApps.crossPlatformifyMREnv(jobConf,Environment.JAVA_HOME)+"/bin/java");System.out.println("$JAVA_HOME/bin/java");vargs.add("$JAVA_HOME/bin/java");
第二处:
//TODO: tianjun修改的源码 for (String key : environment.keySet()){ String org = environment.get(key); String linux = getLinux(org); environment.put(key,linux); }// Setup ContainerLaunchContext for AM containerContainerLaunchContext amContainer = ContainerLaunchContext.newInstance(localResources, environment, vargsFinal, null, securityTokens, acls);
增加上面使用过的getLinx()函数:
//TODO:tianjun 增加private String getLinux(String org) { StringBuilder sb = new StringBuilder(); int c = 0; for (int i = 0; i < org.length(); i++) { if (org.charAt(i) == '%') { c++; if (c % 2 == 1) { sb.append("$"); } } else { switch (org.charAt(i)) { case ';': sb.append(":"); break; case '\\': sb.append("/"); break; default: sb.append(org.charAt(i)); break; } } } return (sb.toString());}
还有一个地方需要十分注意的是:
driver类中需要setJar配置绝对路径,因为setJarByclass本质上是依靠hadoop jar这个命令里面的脚本来读取绝对路径的,现在我们的客户端是在windows上,没有运行在linux集群上,所以setJarByclass会报mapper找不到的错误的。
wcjob.setJar("F:/myWorkPlace/java/dubbo/demo/dubbo-demo/mr-demo1/target/mr.demo-1.0-SNAPSHOT.jar");//如果从本地拷贝,是不行的,这时需要使用setJar//wcjob.setJarByClass(WordCountRunner.class);
阅读全文
0 0
- mapreduce系列(3)----在window端远程提交mr程序运行
- mapreduce系列(3)----在window端远程提交mr程序运行
- mr(mapreduce)几种提交运行模式
- 在JAVA应用中远程提交MapReduce程序至Hadoop集群运行
- MapReduce程序打成jar包在远程服务器运行
- Eclipse本地运行与远程提交MapReduce程序的步骤详解
- MR程序的几种提交运行模式
- MR程序的几种提交运行模式
- MR程序的几种提交运行模式
- MR程序的几种提交运行模式
- Hadoop中MR程序的几种提交运行模式
- 在window上提交spark代码到远程测试环境上运行
- Hadoop系列-使用Eclipse编译运行MapReduce程序 (三)
- MapReduce 编程 系列四 MapReduce例子程序运行
- 在MapReduce远程提交输出结果边为空
- 在eclipse上运行MapReduce程序
- 如何在Windows中运行MapReduce程序
- 如何在Windows中运行MapReduce程序
- 学困生别担心,MindManager思维导图来帮你
- ETC+SC挖矿软件-教程Chinaminer公测版,大神检测无抽水
- @PropertiesSource注解读取配置文件中的数据
- spring mvc: The request sent by the client was syntactically incorrect ()
- angularJs添加
- mapreduce系列(3)----在window端远程提交mr程序运行
- 如何打造你的独立观点
- 一日一文(5)
- Host is not allowed to connect to this MySQL server解决方法
- 【Scikit-Learn 中文文档】随机梯度下降
- bzoj1801: [Ahoi2009]chess 中国象棋(Dp)
- c++ open_file函数
- 酷派大神F1 极速版 卡刷 8297-t01
- Ubuntu安装时:downloading language packs