idea本地运行mapreduce程序
来源:互联网 发布:办公室软件视频 编辑:程序博客网 时间:2024/06/11 02:20
上一篇文章介绍了如何在idea上运行hdfs程序,中间出现了很多错误,通过不断的在网上查找资料和自己的尝试。终于可以正常运行了。
这篇我们将进行mapreduce程序的调试。
准备工作:
下载hadoop到windows本地
地址:https://archive.apache.org/dist/hadoop/core/stable/hadoop-2.7.3.tar.gz
解压之后进行设置环境变量
HADOOP_HOME------D:\git-mobile-workspace\hadoop-2.7.3
Path-----%HADOOP_HOME%\bin;%HADOOP_HOME%\sbin
1.目录结构
以上篇文章的maven工程为基础,继续添加mapreduce的代码。
2.pom.xml
和上一篇文章一样,这里就不在上图了。
3.CountMain类
public static void main(String[] args1) { try { Job job = Tools.getJob(); job.setJarByClass(CountMain.class); Tools.setMapper(job, CountMapper.class, Text.class, LongWritable.class); Tools.setReduce(job, CountReduce.class, Text.class, LongWritable.class); Tools.setInput(job, "/hadoop/mapred_input"); Tools.setOutPut(job, "/hadoop/mapred_output"); job.waitForCompletion(true); } catch (Exception e) { e.printStackTrace(); }}}
4.CountMapper类
public class CountMapper extends Mapper <LongWritable, Text, Text, LongWritable>{ @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { if (value != null) { String realValue = String.valueOf(value); context.write(new Text(realValue), new LongWritable(1)); } }}
5.CountReduce类
public class CountReduce extends Reducer <Text, LongWritable, Text, LongWritable>{ @Override protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException { int total = 0; Iterator<LongWritable> iterator = values.iterator(); while (iterator.hasNext()) { total += Integer.valueOf(String.valueOf(iterator.next())); } context.write(key, new LongWritable(total)); }}
6.Tools类
public class Tools { public static final Configuration configuration = new Configuration(); static { configuration.set("fs.defaultFS", "hdfs://192.168.178.130:9000"); System.setProperty("HADOOP_USER_NAME", "root"); } public static Job getJob() { Job job = null; try { job = Job.getInstance(configuration); } catch (IOException e) { e.printStackTrace(); } return job; } public static void setMapper(Job job, Class mapperClass, Class keyClass, Class valueClass) { job.setMapperClass(mapperClass); job.setMapOutputKeyClass(keyClass); job.setMapOutputValueClass(valueClass); } public static void setReduce(Job job, Class reduceClass, Class keyClass, Class valueClass) { job.setReducerClass(reduceClass); job.setMapOutputKeyClass(keyClass); job.setMapOutputValueClass(valueClass); } public static void setInput(Job job, String path) { try { FileInputFormat.addInputPath(job, new Path(path)); } catch (IOException e) { e.printStackTrace(); } } public static void setOutPut(Job job, String path) { try { FileOutputFormat.setOutputPath(job, new Path(path)); } catch (IllegalArgumentException e) { e.printStackTrace(); } }}
我们现在开始运行一下程序。
17/05/19 10:25:20 ERROR util.Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable D:\git-mobile-workspace\hadoop-2.7.3\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:379)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:394)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:387)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:791)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634)
at org.apache.hadoop.mapreduce.task.JobContextImpl.<init>(JobContextImpl.java:67)
at org.apache.hadoop.mapreduce.Job.<init>(Job.java:101)
at org.apache.hadoop.mapreduce.Job.getInstance(Job.java:72)
at com.hadoop.utils.Tools.getJob(Tools.java:35)
at com.hadoop.mapreduce.CountMain.main(CountMain.java:20)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
17/05/19 10:25:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/05/19 10:25:21 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
17/05/19 10:25:21 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
java.io.IOException: (null) entry in command string: null chmod 0700 D:\tmp\hadoop-chenzhongwei\mapred\staging\root1481997474\.staging
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:770)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:866)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:849)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:733)
at org.apache.hadoop.fs.RawLocalFileSystem.mkOneDirWithMode(RawLocalFileSystem.java:491)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:532)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:509)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:305)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:982)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:976)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:976)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:582)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:612)
at com.hadoop.mapreduce.CountMain.main(CountMain.java:32)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
果不其然报错了。。。。。。
看日志的错误信息是没有winutils.exe这个文件,后来在网上查找了一下,得去下载winutils.exe文件,网上一搜一堆下载一下就可以。
需要注意的是:要和自己的版本相同,否则还是报错。
继续运行main方法,错误变了。
17/05/19 22:40:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/05/19 22:41:00 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
17/05/19 22:41:00 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
17/05/19 22:41:00 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
17/05/19 22:41:00 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
17/05/19 22:41:00 INFO mapred.JobClient: Cleaning up the staging area file:/tmp/hadoop-chenzhongwei/mapred/staging/root1952540668/.staging/job_local1952540668_0001
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://192.168.178.130:9000/hadoop/mapred_input
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1107)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1124)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:178)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:1023)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:976)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:976)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:582)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:612)
at com.hadoop.mapreduce.CountMain.main(CountMain.java:33)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
那么我们得在hdfs上把输入目录(/hadoop/mapred_input)创建好。利用上一篇文件的HdfsTest的main方法即可。代码就不写了。
此时运行main方法,此时环境已经不报错了。日志打印正常。
此时上传a.txt文件到/hadoop/mapred_input目录下
a.txt内容:
4649446494
46494
46462
46462
46500
46500
46494
46462
46462
46462
46538
执行main方法,结果又报错了。
17/05/19 22:53:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/05/19 22:53:34 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
17/05/19 22:53:34 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
17/05/19 22:53:34 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
17/05/19 22:53:34 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
17/05/19 22:53:34 INFO mapred.JobClient: Cleaning up the staging area file:/tmp/hadoop-chenzhongwei/mapred/staging/root1303983606/.staging/job_local1303983606_0001
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory /hadoop/mapred_output already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:132)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:1015)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:976)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:976)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:582)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:612)
at com.hadoop.mapreduce.CountMain.main(CountMain.java:33)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
这个错误告诉我们 输出路径已经存在。
处理的方案是在执行main方法的时候 通过代码删掉该目录。
在Tools类里添加以下代码
public static void deleteOutPutDir(String dir) { Path path = new Path(dir); try { FileSystem fileSystem = path.getFileSystem(configuration); if (fileSystem.exists(path)) { fileSystem.delete(path, true); } } catch (IOException e) { e.printStackTrace(); }}
在CountMain类里的main方法里的job.waitForCompletion(true);之前执行该方法。
public class CountMain { public static void main(String[] args1) { try { Job job = Tools.getJob(); job.setJarByClass(CountMain.class); Tools.setMapper(job, CountMapper.class, Text.class, LongWritable.class); Tools.setReduce(job, CountReduce.class, Text.class, LongWritable.class); Tools.setInput(job, "/hadoop/mapred_input"); Tools.setOutPut(job, "/hadoop/mapred_output"); Tools.deleteOutPutDir("/hadoop/mapred_output"); job.waitForCompletion(true); } catch (Exception e) { e.printStackTrace(); } }}
执行main方法,结果又报错了。
17/05/19 11:59:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/05/19 11:59:43 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
17/05/19 11:59:43 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
17/05/19 11:59:43 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
17/05/19 11:59:43 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
17/05/19 11:59:43 INFO input.FileInputFormat: Total input paths to process : 1
17/05/19 11:59:44 INFO mapred.LocalJobRunner: OutputCommitter set in config null
17/05/19 11:59:44 INFO mapred.JobClient: Running job: job_local615107312_0001
17/05/19 11:59:44 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
17/05/19 11:59:44 INFO mapred.LocalJobRunner: Waiting for map tasks
17/05/19 11:59:44 INFO mapred.LocalJobRunner: Starting task: attempt_local615107312_0001_m_000000_0
17/05/19 11:59:44 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
17/05/19 11:59:44 INFO mapred.Task: Using ResourceCalculatorPlugin : null
17/05/19 11:59:44 INFO mapred.MapTask: Processing split: hdfs://192.168.178.130:9000/hadoop/mapred_input/a.txt:0+82
17/05/19 11:59:44 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
17/05/19 11:59:44 INFO mapred.MapTask: io.sort.mb = 100
17/05/19 11:59:44 INFO mapred.MapTask: data buffer = 79691776/99614720
17/05/19 11:59:44 INFO mapred.MapTask: record buffer = 262144/327680
17/05/19 11:59:44 INFO mapred.LocalJobRunner:
17/05/19 11:59:44 INFO mapred.MapTask: Starting flush of map output
17/05/19 11:59:44 INFO mapred.LocalJobRunner: Map task executor complete.
17/05/19 11:59:44 WARN mapred.LocalJobRunner: job_local615107312_0001
java.lang.Exception: java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)
Caused by: java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:609)
at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:977)
at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:187)
at org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:174)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:108)
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.confChanged(LocalDirAllocator.java:285)
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:344)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
at org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:125)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1270)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1174)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:609)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:675)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
17/05/19 11:59:45 INFO mapred.JobClient: map 0% reduce 0%
17/05/19 11:59:45 INFO mapred.JobClient: Job complete: job_local615107312_0001
17/05/19 11:59:45 INFO mapred.JobClient: Counters: 7
17/05/19 11:59:45 INFO mapred.JobClient: Map-Reduce Framework
17/05/19 11:59:45 INFO mapred.JobClient: Map input records=12
17/05/19 11:59:45 INFO mapred.JobClient: Map output records=12
17/05/19 11:59:45 INFO mapred.JobClient: Map output bytes=168
17/05/19 11:59:45 INFO mapred.JobClient: Input split bytes=118
17/05/19 11:59:45 INFO mapred.JobClient: Combine input records=0
17/05/19 11:59:45 INFO mapred.JobClient: Combine output records=0
17/05/19 11:59:45 INFO mapred.JobClient: Spilled Records=0
通过查找资料,需要把刚刚下载winutils.exe文件的包里应该还有个hadoop.dll文件,也放到本地的hadoop文件夹里的bin目录下。
现在开始执行main方法。
生成了两个文件,part-r-00000文件就是我们计算的结果。
下载之后查看里面的内容:
46462 5
46494 4
46500 2
46538 1
到此,我们就成功了!
- idea本地运行mapreduce程序
- idea本地运行hdfs程序
- mapreduce程序运行
- MapReduce程序运行流程
- MapReduce程序运行过程
- IDEA远程调试mapreduce程序
- mapreduce程序本地模式调试
- IDEA【spark&mapreduce混合配置】mapreduce spark 本地调试
- window7使用eclipse环境本地运行MapReduce程序方法-----源自网站“神算子”:www.wangsenfeng.com
- Hadoop: Intellij结合Maven本地运行和调试MapReduce程序 (无需搭载Hadoop和HDFS环境)
- Eclipse本地运行与远程提交MapReduce程序的步骤详解
- hadoop2.7.2本地调试MR IDEA本地调试mapreduce
- IDEA调试本地Hadoop程序
- eclipse中运行mapreduce程序
- 如何分布式运行mapreduce程序
- Hadoop+eclipse运行MapReduce程序
- mapreduce查看当前运行程序
- MapReduce程序的运行全貌
- ubuntu笔记5
- Java中创建对象的几种方式
- 程序员 很装逼的头注释
- Android Architecture Componets
- mavros 使用经验记录
- idea本地运行mapreduce程序
- Opencv 连通域分析
- 优维科技:DevOps落地经验十四则(上)
- Python的os.path常见用法
- E/dalvikvm: Could not find class 'cn.jpush.android.service.PushJobService', referenced from method c
- STM8S_004_UART基本收发数据
- EF Mappings1 Entity Mappings using Fluent API
- Maven的pom.xml文件的tag详解
- 51 nod 1082 与7无关的数