windows Hadoop开发环境搭建及远程提交
来源:互联网 发布:python 遗传算法 编辑:程序博客网 时间:2024/06/01 08:17
这篇文章将介绍如何搭建hadoop的开发环境,并且详细描述如何通过intellij idea开发hadoop的map-reduce程序以及远程提交。
前提:
- 需要在本机下载hadoop,不需要修改配置安装,但需要设置下hadoop_home,java_home等
- 下载winutils,并解压放在$Hadoop_HOME/bin目录下
- 如果集群配置中都是指定的主机名,那么需要在你本机hosts中加上集群主机解析(不加也可以,就是不太方便)
方法一:maven项目
1、intellij idea创建maven项目这里就不多说了,先创建一个maven项目。
2、配置pom.xml文件,补全pom.xml文件之后,idea会自动下载jar包并引入。
<dependencies><dependency><groupId>junit</groupId><artifactId>junit</artifactId><version>3.8.1</version><scope>test</scope></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-common</artifactId><version>2.8.0</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-client</artifactId><version>2.8.0</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-hdfs</artifactId><version>2.8.0</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-mapreduce-client-core</artifactId><version>2.8.0</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-mapreduce-client-jobclient</artifactId><version>2.8.0</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-mapreduce-client-common</artifactId><version>2.8.0</version></dependency></dependencies>
方法二:新建java项目
1、intellij idea创建java项目
2、添加依赖
导入成功后
3、编写代码
import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import java.io.IOException;import java.util.StringTokenizer;public class WordCount {public static class TokenizerMapperextends Mapper<Object, Text, Text, IntWritable> {private final static IntWritable one = new IntWritable(1);private Text word = new Text();public void map(Object key, Text value, Context context) throws IOException, InterruptedException {StringTokenizer itr = new StringTokenizer(value.toString());while (itr.hasMoreTokens()) {word.set(itr.nextToken());context.write(word, one);}}}public static class IntSumReducerextends Reducer<Text, IntWritable, Text, IntWritable> {private IntWritable result = new IntWritable();public void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {int sum = 0;for (IntWritable val : values) {sum += val.get();}result.set(sum);context.write(key, result);}}private static void deleteDir(Configuration conf, String dirPath) throws IOException {FileSystem fs = FileSystem.get(conf);Path targetPath = new Path(dirPath);if (fs.exists(targetPath)) {boolean delResult = fs.delete(targetPath, true);if (delResult) {System.out.println(targetPath + " has been deleted sucessfullly.");} else {System.out.println(targetPath + " deletion failed.");}}}public static void main(String[] args) throws Exception {Configuration conf = new Configuration();/* String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();if (otherArgs.length < 2) {System.err.println("Usage: wordcount <in> [<in>...] <out>");System.exit(2);}//先删除output目录deleteDir(conf, otherArgs[otherArgs.length - 1]);*/Job job = Job.getInstance(conf, "word count");job.setJarByClass(WordCount.class);job.setMapperClass(TokenizerMapper.class);job.setCombinerClass(IntSumReducer.class);job.setReducerClass(IntSumReducer.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);FileInputFormat.addInputPath(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));System.exit(job.waitForCompletion(true) ? 0 : 1);}}
统计 args第一个参数对应的文件目录中所有文件中单词出现的次数
输出结果在第二个参数对应的文件目录中会自动创建目录 运行前要保证目录不存在
4、编辑configuration
5、运行成功
远程配置
新建Resource目录,配置为项目Resources
添加core-site.xml文件到Resource目录下
<configuration><property><name>fs.defaultFS</name><value>hdfs://192.168.89.135:9000</value></property></configuration>
可以直接从Hadoo的配置文件复制过来
修改configurations
修改输入输入文件地址为远程hdfs地址
本地提交
如果你的hadoop和idea在同一台服务器上,那么你可以选择Local提交
1、把coer-site.xml、log4j.properties复制到项目的源码根目录下(保证编译后在class目录下能找到该两个文件),为什么要这样呢?因为你直接在idea中提交job,会加载class文件夹下的配置文件,如果没有log4j.properties文件,则会提示log4j没有初始化,结果是没有任务信息打印。core-site.xml一样,如果不放到源文件目录下,则会报hdfs权限等问题。
2、在idea中直接运行该类的主方法,就可以提交到本地hadoop伪分布安装模式上了,可以对代码进行调试。
3、注意:我们在hadoop的配置文件mapred-site.xml指定了YARN调度,但是提交job的时候,根据debug之后发现,调用的是LocalCluster。并没有使用YARN.有如下两点原因:
【原因1:】需要把mared-site.xml文件和yarm.xml文件放到resource文件夹下
【原因2:】需要把文件程序打包才能进行远程提交job见:下一节远程提交
远程提交
如果你的hadoop是集群或者是其他服务器,idea在不同的服务器你可以选择远程提交,在hadoop-2.8.0中使用YARN进行调度。
1、把core-site.xml、hdfs-site.xml、mapred-site.xml、yarn.xml、log4j.properties等文件放到resource目录,如果不添加这些文件,相关设置需要在代码中指定
conf.set("mapreduce.job.jar", "E:\\hadoop\\myhadoop\\out\\artifacts\\wordcount\\wordcount.jar");//指定Jar包,也可以在job中设置conf.set("mapreduce.framework.name", "yarn");//以yarn形式提交conf.set("yarn.resourcemanager.hostname", "master");conf.set("mapreduce.app-submission.cross-platform", "true");//跨平台提交
如果集群设置了hdfs访问权限限制,比如开启了指定用户xxx才能访问那么可以在程序里设置
System.setProperty("HADOOP_USER_NAME", "xxx")
2、先把该project进行打包,使用maven或者idea的自动打包功能进行打包
- maven
mvn package
Idea自动打包
因为集群上已经有了相关的环境,这里打包就不用添加依赖到了,选择Empty。这样调试时Build速度快。
Project Structure=>Artifacts=> 点左上角的 + =>Empty =>Output Layout + => Module Output =>选择项目文件夹=>点击jar包,设置MainClass 即可
3、需要在程序代码中设置job.setJar
job.setJar("E:\\hadoop\\myhadoop\\out\\artifacts\\wordcount\\wordcount.jar");
4、程序代码中:10020端口是hadoop历史服务,需要在服务器端启动
mr-jobhistory-daemon.sh start historyserver & #启动历史服务
5、在idea中运行程序,就提交了job,并且该种job提交方式还可以进行在idea中进行源码调试。
6、自动提交Jar包到集群上(非必须)
Tools -> Deployment -> Configuration点击左上角 + ,Type选择SFTP,然后配置服务器ip和部署路径,用户名、密码等选项之后选择自动部署,这样每次修改都会自动部署到服务器,也可以右键,选择Deployment,upload to …
常见问题:
问题1:
Exception in thread "main" java.lang.RuntimeException: java.io.FileNotFoundException: Could not locate Hadoop executable: E:\hadoop-2.8.0\bin\winutils.exe -see https://wiki.apache.org/hadoop/WindowsProblems at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:716) at org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:250) at org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:267) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:771) at org.apache.hadoop.fs.RawLocalFileSystem.mkOneDirWithMode(RawLocalFileSystem.java:515) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:555) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:533) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:313) at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:133) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:146) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1359) at WordCount.main(WordCount.java:92)Caused by: java.io.FileNotFoundException: Could not locate Hadoop executable: E:\hadoop-2.8.0\bin\winutils.exe -see https://wiki.apache.org/hadoop/WindowsProblems at org.apache.hadoop.util.Shell.getQualifiedBinInner(Shell.java:598) at org.apache.hadoop.util.Shell.getQualifiedBin(Shell.java:572) at org.apache.hadoop.util.Shell.<clinit>(Shell.java:669) at org.apache.hadoop.util.GenericOptionsParser.preProcessForWindows(GenericOptionsParser.java:441) at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:487) at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:170) at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153) at WordCount.main(WordCount.java:71)Process finished with exit code 1
解决办法:将winutil.exe放在$HADOOP_HOME/bin目录下
问题2:
2017-08-04 12:31:00,668 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable2017-08-04 12:31:01,230 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1181)) - session.id is deprecated. Instead, use dfs.metrics.session-id2017-08-04 12:31:01,230 INFO [main] jvm.JvmMetrics (JvmMetrics.java:init(79)) - Initializing JVM Metrics with processName=JobTracker, sessionId=2017-08-04 12:31:01,495 WARN [main] mapreduce.JobResourceUploader (JobResourceUploader.java:uploadFiles(171)) - No job jar file set. User classes may not be found. See Job or Job#setJar(String).2017-08-04 12:31:01,542 INFO [main] input.FileInputFormat (FileInputFormat.java:listStatus(289)) - Total input files to process : 12017-08-04 12:31:01,870 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(200)) - number of splits:12017-08-04 12:31:02,104 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(289)) - Submitting tokens for job: job_local1047774324_0001Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method) at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:606) at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:958) at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:203) at org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:190) at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:124) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.confChanged(LocalDirAllocator.java:314) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:377) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:151) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:116) at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:125) at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:171) at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:758) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:242) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338) at java.security.AccessController.doPrivileged(Native Method)2017-08-04 12:31:02,167 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(251)) - Cleaning up the staging area file:/tmp/hadoop/mapred/staging/alex1047774324/.staging/job_local1047774324_0001 at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1359) at WordCount.main(WordCount.java:92)
解决办法:缺少hadoop.dll,把hadoop.dll放在$HADOOP_HOME/bin目录下
问题3:
2017-08-04 12:47:49,125 INFO [main] ipc.Client (Client.java:handleConnectionTimeout(897)) - Retrying connect to server: master/192.168.89.135:9000. Already tried 0 time(s); maxRetries=45
解决办法:远程主机没有启动hadoop,若启动了检查是否关闭了firewalld.service和iptables.service
问题4:
2017-11-29 21:10:22,214 INFO [main] client.RMProxy (RMProxy.java:createRMProxy(123)) - Connecting to ResourceManager at master/192.168.89.136:80322017-11-29 21:10:23,259 INFO [main] input.FileInputFormat (FileInputFormat.java:listStatus(289)) - Total input files to process : 12017-11-29 21:10:24,216 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(200)) - number of splits:12017-11-29 21:10:24,769 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(289)) - Submitting tokens for job: job_1511957984981_00072017-11-29 21:10:24,984 INFO [main] impl.YarnClientImpl (YarnClientImpl.java:submitApplication(296)) - Submitted application application_1511957984981_00072017-11-29 21:10:25,024 INFO [main] mapreduce.Job (Job.java:submit(1345)) - The url to track the job: http://master:8088/proxy/application_1511957984981_0007/2017-11-29 21:10:25,024 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1390)) - Running job: job_1511957984981_00072017-11-29 21:10:28,088 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1411)) - Job job_1511957984981_0007 running in uber mode : false2017-11-29 21:10:28,090 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1418)) - map 0% reduce 0%2017-11-29 21:10:28,164 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1431)) - Job job_1511957984981_0007 failed with state FAILED due to: Application application_1511957984981_0007 failed 2 times due to AM Container for appattempt_1511957984981_0007_000002 exited with exitCode: 1Failing this attempt.Diagnostics: Exception from container-launch.Container id: container_1511957984981_0007_02_000001Exit code: 1Exception message: /bin/bash: line 0: fg: no job controlStack trace: ExitCodeException exitCode=1: /bin/bash: line 0: fg: no job control at org.apache.hadoop.util.Shell.runCommand(Shell.java:972) at org.apache.hadoop.util.Shell.run(Shell.java:869) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:236) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:305) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:84) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)Container exited with a non-zero exit code 1For more detailed output, check the application tracking page: http://master:8088/cluster/app/application_1511957984981_0007 Then click on links to logs of each attempt.. Failing the application.2017-11-29 21:10:28,199 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1436)) - Counters: 0Process finished with exit code 1
这是因为windows 和远程Linux集群跨平台造成的
解决办法:在代码中添加
conf.set("mapreduce.app-submission.cross-platform", "true");
- windows Hadoop开发环境搭建及远程提交
- windows下idea中搭建hadoop开发环境,向远程hadoop集群提交mapreduce任务
- windows下搭建eclipse远程开发hadoop应用环境
- windows 32 eclipse 远程hadoop开发环境搭建
- Hadoop-Windows下的Eclipse开发环境搭建,远程虚拟机Hadoop服务器
- Hadoop 2.6 远程开发环境搭建
- 手动搭建搭建Hadoop虚拟机集群与windows远程开发环境
- windows下搭建hadoop开发环境(Eclipse)
- windows下搭建hadoop+eclipse开发环境
- windows下搭建hadoop开发环境
- windows下搭建hadoop+eclipse开发环境
- windows上搭建hadoop开发环境
- 搭建windows(win7)平台hadoop开发环境
- windows下搭建hadoop开发环境
- windows上搭建hadoop开发环境
- Windows下搭建Spark+Hadoop开发环境
- Windows上搭建hadoop开发环境
- hadoop windows平台开发环境搭建
- 抽取式文档摘要方法(二)
- Eclipse 调试出现source not found问题
- pickle用法
- Git安装配置
- 设计模式之适配器模式与外观模式
- windows Hadoop开发环境搭建及远程提交
- 可变参数写法
- Java日志体系
- ajax实现异步文件或图片上传功能
- QT 字体设置的bug
- oracle递归子查询因子化
- 作业: 选择某种Map集合保存学号从1到15的学员的学号(键)和姓名(值),学号用字符串表示,输入的时候要以学号乱序的方式存入Map集合,然后按照学号从大到小的顺序将Map集合中的元素输出打印。需要自
- White spaces are required between publicId and systemId.异常
- Andrew Ng's deeplearning Course2Week3 Hyperparameter tuning, Batch Normalization and Frameworks