windows上运行mapreduce
来源:互联网 发布:php 大文件分片上传 编辑:程序博客网 时间:2024/06/05 13:29
windows上运行mapreduce
环境搭建参考这篇文章:http://blog.csdn.net/baolibin528/article/details/43868477
代码:
packagemapreduce;importjava.net.URI; importorg.apache.hadoop.conf.Configuration;importorg.apache.hadoop.fs.FileSystem;importorg.apache.hadoop.fs.Path;importorg.apache.hadoop.io.LongWritable;importorg.apache.hadoop.io.Text;importorg.apache.hadoop.mapreduce.Job;importorg.apache.hadoop.mapreduce.Mapper;importorg.apache.hadoop.mapreduce.Reducer;importorg.apache.hadoop.mapreduce.lib.input.FileInputFormat;importorg.apache.hadoop.mapreduce.lib.input.TextInputFormat;importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;importorg.apache.hadoop.mapreduce.lib.output.TextOutputFormat;importorg.apache.hadoop.mapreduce.lib.partition.HashPartitioner; public class Mapreduce { static final String INPUT_PATH = "hdfs://192.168.1.100:9000/input/text01"; static final String OUT_PATH = "hdfs://192.168.1.100:9000/output/out01"; public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); final FileSystem fileSystem = FileSystem.get(new URI(INPUT_PATH), conf); final Path outPath = new Path(OUT_PATH); if(fileSystem.exists(outPath)){ fileSystem.delete(outPath, true); } final Job job = new Job(conf , Mapreduce.class.getSimpleName()); //1.1指定读取的文件位于哪里 FileInputFormat.setInputPaths(job, INPUT_PATH); //指定如何对输入文件进行格式化,把输入文件每一行解析成键值对 job.setInputFormatClass(TextInputFormat.class); //1.2 指定自定义的map类 job.setMapperClass(MyMapper.class); //map输出的<k,v>类型。如果<k3,v3>的类型与<k2,v2>类型一致,则可以省略 job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(LongWritable.class); //1.3 分区 job.setPartitionerClass(HashPartitioner.class); //有一个reduce任务运行 job.setNumReduceTasks(1); //1.4 TODO排序、分组 //1.5 TODO规约 //2.2 指定自定义reduce类 job.setReducerClass(MyReducer.class); //指定reduce的输出类型 job.setOutputKeyClass(Text.class); job.setOutputValueClass(LongWritable.class); //2.3 指定写出到哪里 FileOutputFormat.setOutputPath(job, outPath); //指定输出文件的格式化类 job.setOutputFormatClass(TextOutputFormat.class); //把job提交给JobTracker运行 job.waitForCompletion(true); } /** *KEYIN 即k1 表示行的偏移量 *VALUEIN 即v1 表示行文本内容 *KEYOUT 即k2 表示行中出现的单词 *VALUEOUT 即v2 表示行中出现的单词的次数,固定值1 */ static class MyMapper extends Mapper<LongWritable, Text, Text,LongWritable>{ protected void map(LongWritable k1, Text v1, Context context) throws java.io.IOException ,InterruptedException { final String[] splited = v1.toString().split(" "); for (String word : splited) { context.write(new Text(word), new LongWritable(1)); } }; } /** *KEYIN 即k2 表示行中出现的单词 *VALUEIN 即v2 表示行中出现的单词的次数 *KEYOUT 即k3 表示文本中出现的不同单词 *VALUEOUT 即v3 表示文本中出现的不同单词的总次数 * */ static class MyReducer extends Reducer<Text, LongWritable,Text, LongWritable>{ protected void reduce(Text k2,java.lang.Iterable<LongWritable> v2s, Context ctx) throws java.io.IOException ,InterruptedException { long times = 0L; for (LongWritable count : v2s) { times += count.get(); } ctx.write(k2, new LongWritable(times)); }; } }
把路径为下面的java类拷贝到工程目录中:
hadoop-1.2.1\src\core\org\apache\hadoop\fs
如下:
在代码里点击ctrl+F 快捷键快速查看,并注释掉某些内容:
注释内容如下:
不拷贝那个java 类或 拷贝了不把 上面内容注释掉,运行时会出现下面这个 错误:
异常内容为:
15/02/1812:37:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library foryour platform... using builtin-java classes where applicable15/02/1812:37:31 ERROR security.UserGroupInformation: PriviledgedActionException as:Administrator cause:java.io.IOException: Failed to set permissions of path:\tmp\hadoop-Administrator\mapred\staging\Administrator986015421\.staging to0700Exceptionin thread "main" java.io.IOException: Failed to set permissions of path:\tmp\hadoop-Administrator\mapred\staging\Administrator986015421\.staging to0700 atorg.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:691) atorg.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:664) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514) atorg.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349) atorg.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:193) at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) atjava.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) atorg.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) at org.apache.hadoop.mapreduce.Job.submit(Job.java:550) atorg.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580) at mapreduce.Mapreduce.main(Mapreduce.java:61)
查看HDFS输入文件内容:
查看输出文件, output文件夹已经创建成功:
查看刚才运行作业产生的两个文件:
查看最终结果:
Console显示如下:
15/02/1813:05:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library foryour platform... using builtin-java classes where applicable15/02/1813:05:53 WARN mapred.JobClient: Use GenericOptionsParser for parsing thearguments. Applications should implement Tool for the same.15/02/1813:05:53 WARN mapred.JobClient: No job jar file set. User classes may not be found. SeeJobConf(Class) or JobConf#setJar(String).15/02/1813:05:53 INFO input.FileInputFormat: Total input paths to process : 115/02/1813:05:53 WARN snappy.LoadSnappy: Snappy native library not loaded15/02/1813:05:53 INFO mapred.JobClient: Running job: job_local1096498984_000115/02/1813:05:53 INFO mapred.LocalJobRunner: Waiting for map tasks15/02/1813:05:53 INFO mapred.LocalJobRunner: Starting task:attempt_local1096498984_0001_m_000000_015/02/1813:05:53 INFO mapred.Task: UsingResourceCalculatorPlugin : null15/02/1813:05:53 INFO mapred.MapTask: Processing split:hdfs://192.168.1.100:9000/input/text01:0+15415/02/1813:05:53 INFO mapred.MapTask: io.sort.mb = 10015/02/1813:05:53 INFO mapred.MapTask: data buffer = 79691776/9961472015/02/1813:05:53 INFO mapred.MapTask: record buffer = 262144/32768015/02/1813:05:53 INFO mapred.MapTask: Starting flush of map output15/02/1813:05:53 INFO mapred.MapTask: Finished spill 015/02/1813:05:53 INFO mapred.Task: Task:attempt_local1096498984_0001_m_000000_0 isdone. And is in the process of commiting15/02/1813:05:53 INFO mapred.LocalJobRunner:15/02/1813:05:53 INFO mapred.Task: Task 'attempt_local1096498984_0001_m_000000_0' done.15/02/1813:05:53 INFO mapred.LocalJobRunner: Finishing task:attempt_local1096498984_0001_m_000000_015/02/1813:05:53 INFO mapred.LocalJobRunner: Map task executor complete.15/02/1813:05:53 INFO mapred.Task: UsingResourceCalculatorPlugin : null15/02/1813:05:53 INFO mapred.LocalJobRunner:15/02/1813:05:53 INFO mapred.Merger: Merging 1 sorted segments15/02/1813:05:53 INFO mapred.Merger: Down to the last merge-pass, with 1 segments leftof total size: 416 bytes15/02/1813:05:53 INFO mapred.LocalJobRunner:15/02/1813:05:53 INFO mapred.Task: Task:attempt_local1096498984_0001_r_000000_0 isdone. And is in the process of commiting15/02/1813:05:53 INFO mapred.LocalJobRunner:15/02/1813:05:53 INFO mapred.Task: Task attempt_local1096498984_0001_r_000000_0 isallowed to commit now15/02/1813:05:53 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1096498984_0001_r_000000_0'to hdfs://192.168.1.100:9000/output/out0115/02/1813:05:53 INFO mapred.LocalJobRunner: reduce > reduce15/02/1813:05:53 INFO mapred.Task: Task 'attempt_local1096498984_0001_r_000000_0' done.15/02/1813:05:54 INFO mapred.JobClient: map 100%reduce 100%15/02/1813:05:54 INFO mapred.JobClient: Job complete: job_local1096498984_000115/02/1813:05:54 INFO mapred.JobClient: Counters: 1915/02/1813:05:54 INFO mapred.JobClient: FileOutput Format Counters15/02/1813:05:54 INFO mapred.JobClient: BytesWritten=9615/02/1813:05:54 INFO mapred.JobClient: FileInput Format Counters15/02/1813:05:54 INFO mapred.JobClient: BytesRead=15415/02/1813:05:54 INFO mapred.JobClient: FileSystemCounters15/02/1813:05:54 INFO mapred.JobClient: FILE_BYTES_READ=73415/02/1813:05:54 INFO mapred.JobClient: HDFS_BYTES_READ=30815/02/1813:05:54 INFO mapred.JobClient: FILE_BYTES_WRITTEN=13990415/02/1813:05:54 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=9615/02/1813:05:54 INFO mapred.JobClient: Map-Reduce Framework15/02/1813:05:54 INFO mapred.JobClient: Mapoutput materialized bytes=42015/02/1813:05:54 INFO mapred.JobClient: Mapinput records=315/02/1813:05:54 INFO mapred.JobClient: Reduceshuffle bytes=015/02/1813:05:54 INFO mapred.JobClient: Spilled Records=5215/02/1813:05:54 INFO mapred.JobClient: Mapoutput bytes=36215/02/1813:05:54 INFO mapred.JobClient: Totalcommitted heap usage (bytes)=32387891215/02/1813:05:54 INFO mapred.JobClient: Combine input records=015/02/1813:05:54 INFO mapred.JobClient: SPLIT_RAW_BYTES=10315/02/1813:05:54 INFO mapred.JobClient: Reduce input records=2615/02/1813:05:54 INFO mapred.JobClient: Reduce input groups=1215/02/1813:05:54 INFO mapred.JobClient: Combine output records=015/02/1813:05:54 INFO mapred.JobClient: Reduce output records=1215/02/1813:05:54 INFO mapred.JobClient: Mapoutput records=26
0 1
- windows上运行mapreduce
- windows上远程运行MapReduce任务遇到的问题
- eclipse上运行mapreduce
- windows上运行MapReduce出错(Failed to set permissions of path)
- Windows上利用Eclipse运行MapReduce需要注意的几点
- 如何在Amazon Elastic MapReduce(EMR)上运行MapReduce程序
- 在eclipse上运行MapReduce程序
- 在MyEclipse上运行MapReduce发生错误
- MapReduce在YARN上的运行流程
- windows Eclipse运行mapreduce配置说明
- windows环境下eclipse运行mapreduce方法
- 如何在Windows中运行MapReduce程序
- 如何在Windows中运行MapReduce程序
- opennms windows上运行
- windows 7上运行docker
- windows上运行tex文件
- windows上运行rabbitmq集群
- Windows上安装运行 cnpm
- POJ 1062 昂贵的聘礼(最短路+枚举)
- CodeForces 276C Little Girl and Maximum Sum
- UIKit基础:13.UI基础控件的小结
- 深入浅出Android Support Annotations
- CodeForces 295A Greg and Array
- windows上运行mapreduce
- 可以这样去理解group by和聚合函数
- Failed to set permissions of path:
- Hadoop数据类型
- Android Activity生命周期简明、详细介绍
- Android 进程/内存管理误区
- POJ 2777 Count Color
- 【Unity】A*算法搜索过程可视化
- 2015/2/19