windows Hadoop开发环境搭建及远程提交

来源：互联网发布：python 遗传算法编辑：程序博客网时间：2024/06/01 08:17

这篇文章将介绍如何搭建hadoop的开发环境，并且详细描述如何通过intellij idea开发hadoop的map-reduce程序以及远程提交。
前提：

需要在本机下载hadoop,不需要修改配置安装，但需要设置下hadoop_home,java_home等
下载winutils,并解压放在$Hadoop_HOME/bin目录下
如果集群配置中都是指定的主机名，那么需要在你本机hosts中加上集群主机解析（不加也可以，就是不太方便）

方法一：maven项目

1、intellij idea创建maven项目这里就不多说了，先创建一个maven项目。
2、配置pom.xml文件，补全pom.xml文件之后，idea会自动下载jar包并引入。

<dependencies><dependency><groupId>junit</groupId><artifactId>junit</artifactId><version>3.8.1</version><scope>test</scope></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-common</artifactId><version>2.8.0</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-client</artifactId><version>2.8.0</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-hdfs</artifactId><version>2.8.0</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-mapreduce-client-core</artifactId><version>2.8.0</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-mapreduce-client-jobclient</artifactId><version>2.8.0</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-mapreduce-client-common</artifactId><version>2.8.0</version></dependency></dependencies>

方法二：新建java项目

1、intellij idea创建java项目

2、添加依赖

这里写图片描述

导入成功后

这里写图片描述

3、编写代码

import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import java.io.IOException;import java.util.StringTokenizer;public class WordCount {public static class TokenizerMapperextends Mapper<Object, Text, Text, IntWritable> {private final static IntWritable one = new IntWritable(1);private Text word = new Text();public void map(Object key, Text value, Context context) throws IOException, InterruptedException {StringTokenizer itr = new StringTokenizer(value.toString());while (itr.hasMoreTokens()) {word.set(itr.nextToken());context.write(word, one);}}}public static class IntSumReducerextends Reducer<Text, IntWritable, Text, IntWritable> {private IntWritable result = new IntWritable();public void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {int sum = 0;for (IntWritable val : values) {sum += val.get();}result.set(sum);context.write(key, result);}}private static void deleteDir(Configuration conf, String dirPath) throws IOException {FileSystem fs = FileSystem.get(conf);Path targetPath = new Path(dirPath);if (fs.exists(targetPath)) {boolean delResult = fs.delete(targetPath, true);if (delResult) {System.out.println(targetPath + " has been deleted sucessfullly.");} else {System.out.println(targetPath + " deletion failed.");}}}public static void main(String[] args) throws Exception {Configuration conf = new Configuration();/* String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();if (otherArgs.length < 2) {System.err.println("Usage: wordcount <in> [<in>...] <out>");System.exit(2);}//先删除output目录deleteDir(conf, otherArgs[otherArgs.length - 1]);*/Job job = Job.getInstance(conf, "word count");job.setJarByClass(WordCount.class);job.setMapperClass(TokenizerMapper.class);job.setCombinerClass(IntSumReducer.class);job.setReducerClass(IntSumReducer.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);FileInputFormat.addInputPath(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));System.exit(job.waitForCompletion(true) ? 0 : 1);}}

统计 args第一个参数对应的文件目录中所有文件中单词出现的次数
输出结果在第二个参数对应的文件目录中会自动创建目录运行前要保证目录不存在

4、编辑configuration
这里写图片描述

5、运行成功

这里写图片描述

远程配置

新建Resource目录，配置为项目Resources

这里写图片描述

添加core-site.xml文件到Resource目录下

这里写图片描述

<configuration><property><name>fs.defaultFS</name><value>hdfs://192.168.89.135:9000</value></property></configuration>

可以直接从Hadoo的配置文件复制过来

修改configurations
修改输入输入文件地址为远程hdfs地址
这里写图片描述

本地提交

如果你的hadoop和idea在同一台服务器上，那么你可以选择Local提交
1、把coer-site.xml、log4j.properties复制到项目的源码根目录下（保证编译后在class目录下能找到该两个文件），为什么要这样呢？因为你直接在idea中提交job，会加载class文件夹下的配置文件，如果没有log4j.properties文件，则会提示log4j没有初始化，结果是没有任务信息打印。core-site.xml一样，如果不放到源文件目录下，则会报hdfs权限等问题。
2、在idea中直接运行该类的主方法，就可以提交到本地hadoop伪分布安装模式上了，可以对代码进行调试。
3、注意:我们在hadoop的配置文件mapred-site.xml指定了YARN调度，但是提交job的时候，根据debug之后发现，调用的是LocalCluster。并没有使用YARN.有如下两点原因：
【原因1：】需要把mared-site.xml文件和yarm.xml文件放到resource文件夹下
【原因2：】需要把文件程序打包才能进行远程提交job见：下一节远程提交

远程提交

如果你的hadoop是集群或者是其他服务器，idea在不同的服务器你可以选择远程提交，在hadoop-2.8.0中使用YARN进行调度。
1、把core-site.xml、hdfs-site.xml、mapred-site.xml、yarn.xml、log4j.properties等文件放到resource目录，如果不添加这些文件，相关设置需要在代码中指定

conf.set("mapreduce.job.jar", "E:\\hadoop\\myhadoop\\out\\artifacts\\wordcount\\wordcount.jar");//指定Jar包，也可以在job中设置conf.set("mapreduce.framework.name", "yarn");//以yarn形式提交conf.set("yarn.resourcemanager.hostname", "master");conf.set("mapreduce.app-submission.cross-platform", "true");//跨平台提交

如果集群设置了hdfs访问权限限制，比如开启了指定用户xxx才能访问那么可以在程序里设置

System.setProperty("HADOOP_USER_NAME", "xxx")

2、先把该project进行打包,使用maven或者idea的自动打包功能进行打包

maven

mvn package

Idea自动打包
因为集群上已经有了相关的环境，这里打包就不用添加依赖到了，选择Empty。这样调试时Build速度快。
Project Structure=>Artifacts=> 点左上角的 + =>Empty =>Output Layout + => Module Output =>选择项目文件夹=>点击jar包，设置MainClass 即可

3、需要在程序代码中设置job.setJar

job.setJar("E:\\hadoop\\myhadoop\\out\\artifacts\\wordcount\\wordcount.jar");

4、程序代码中：10020端口是hadoop历史服务，需要在服务器端启动

mr-jobhistory-daemon.sh start historyserver & #启动历史服务

5、在idea中运行程序，就提交了job，并且该种job提交方式还可以进行在idea中进行源码调试。

6、自动提交Jar包到集群上（非必须）
Tools -> Deployment -> Configuration点击左上角 + ，Type选择SFTP，然后配置服务器ip和部署路径，用户名、密码等选项之后选择自动部署，这样每次修改都会自动部署到服务器，也可以右键，选择Deployment，upload to …

常见问题：

问题1：

Exception in thread "main" java.lang.RuntimeException: java.io.FileNotFoundException: Could not locate Hadoop executable: E:\hadoop-2.8.0\bin\winutils.exe -see https://wiki.apache.org/hadoop/WindowsProblems    at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:716)    at org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:250)    at org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:267)    at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:771)    at org.apache.hadoop.fs.RawLocalFileSystem.mkOneDirWithMode(RawLocalFileSystem.java:515)    at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:555)    at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:533)    at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:313)    at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:133)    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:146)    at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)    at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338)    at java.security.AccessController.doPrivileged(Native Method)    at javax.security.auth.Subject.doAs(Subject.java:415)    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338)    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1359)    at WordCount.main(WordCount.java:92)Caused by: java.io.FileNotFoundException: Could not locate Hadoop executable: E:\hadoop-2.8.0\bin\winutils.exe -see https://wiki.apache.org/hadoop/WindowsProblems    at org.apache.hadoop.util.Shell.getQualifiedBinInner(Shell.java:598)    at org.apache.hadoop.util.Shell.getQualifiedBin(Shell.java:572)    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:669)    at org.apache.hadoop.util.GenericOptionsParser.preProcessForWindows(GenericOptionsParser.java:441)    at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:487)    at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:170)    at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)    at WordCount.main(WordCount.java:71)Process finished with exit code 1

解决办法：将winutil.exe放在$HADOOP_HOME/bin目录下

问题2：

2017-08-04 12:31:00,668 WARN  [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable2017-08-04 12:31:01,230 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1181)) - session.id is deprecated. Instead, use dfs.metrics.session-id2017-08-04 12:31:01,230 INFO  [main] jvm.JvmMetrics (JvmMetrics.java:init(79)) - Initializing JVM Metrics with processName=JobTracker, sessionId=2017-08-04 12:31:01,495 WARN  [main] mapreduce.JobResourceUploader (JobResourceUploader.java:uploadFiles(171)) - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).2017-08-04 12:31:01,542 INFO  [main] input.FileInputFormat (FileInputFormat.java:listStatus(289)) - Total input files to process : 12017-08-04 12:31:01,870 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(200)) - number of splits:12017-08-04 12:31:02,104 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(289)) - Submitting tokens for job: job_local1047774324_0001Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z    at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)    at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:606)    at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:958)    at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:203)    at org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:190)    at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:124)    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.confChanged(LocalDirAllocator.java:314)    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:377)    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:151)    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132)    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:116)    at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:125)    at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:171)    at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:758)    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:242)    at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)    at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338)    at java.security.AccessController.doPrivileged(Native Method)2017-08-04 12:31:02,167 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(251)) - Cleaning up the staging area file:/tmp/hadoop/mapred/staging/alex1047774324/.staging/job_local1047774324_0001    at javax.security.auth.Subject.doAs(Subject.java:415)    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338)    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1359)    at WordCount.main(WordCount.java:92)

解决办法：缺少hadoop.dll，把hadoop.dll放在$HADOOP_HOME/bin目录下

问题3：

2017-08-04 12:47:49,125 INFO  [main] ipc.Client (Client.java:handleConnectionTimeout(897)) - Retrying connect to server: master/192.168.89.135:9000. Already tried 0 time(s); maxRetries=45

解决办法：远程主机没有启动hadoop,若启动了检查是否关闭了firewalld.service和iptables.service

问题4：

2017-11-29 21:10:22,214 INFO  [main] client.RMProxy (RMProxy.java:createRMProxy(123)) - Connecting to ResourceManager at master/192.168.89.136:80322017-11-29 21:10:23,259 INFO  [main] input.FileInputFormat (FileInputFormat.java:listStatus(289)) - Total input files to process : 12017-11-29 21:10:24,216 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(200)) - number of splits:12017-11-29 21:10:24,769 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(289)) - Submitting tokens for job: job_1511957984981_00072017-11-29 21:10:24,984 INFO  [main] impl.YarnClientImpl (YarnClientImpl.java:submitApplication(296)) - Submitted application application_1511957984981_00072017-11-29 21:10:25,024 INFO  [main] mapreduce.Job (Job.java:submit(1345)) - The url to track the job: http://master:8088/proxy/application_1511957984981_0007/2017-11-29 21:10:25,024 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1390)) - Running job: job_1511957984981_00072017-11-29 21:10:28,088 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1411)) - Job job_1511957984981_0007 running in uber mode : false2017-11-29 21:10:28,090 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1418)) -  map 0% reduce 0%2017-11-29 21:10:28,164 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1431)) - Job job_1511957984981_0007 failed with state FAILED due to: Application application_1511957984981_0007 failed 2 times due to AM Container for appattempt_1511957984981_0007_000002 exited with  exitCode: 1Failing this attempt.Diagnostics: Exception from container-launch.Container id: container_1511957984981_0007_02_000001Exit code: 1Exception message: /bin/bash: line 0: fg: no job controlStack trace: ExitCodeException exitCode=1: /bin/bash: line 0: fg: no job control    at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)    at org.apache.hadoop.util.Shell.run(Shell.java:869)    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:236)    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:305)    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:84)    at java.util.concurrent.FutureTask.run(FutureTask.java:266)    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)    at java.lang.Thread.run(Thread.java:748)Container exited with a non-zero exit code 1For more detailed output, check the application tracking page: http://master:8088/cluster/app/application_1511957984981_0007 Then click on links to logs of each attempt.. Failing the application.2017-11-29 21:10:28,199 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1436)) - Counters: 0Process finished with exit code 1

这是因为windows 和远程Linux集群跨平台造成的

解决办法：在代码中添加

conf.set("mapreduce.app-submission.cross-platform", "true");

阅读全文

0 0